October 6, 2010; Gregg M. Townsend
Requires: Unix, dynamic loading
This file is in the public domain.
Weblinks is a program for checking links in a collection of HTML
files. It is designed for use directly on the file structure
containing the HTML files.
Given one or more starting points, weblinks parses each file and
validates the HTTP: and FILE: links it finds. Errors are reported
on standard output. FILE: links, including relative links, can be
followed recursively.
____________________________________________________________
By design, only local files are scanned. Only an existence check is
performed for HTTP: links. Validation of HTTP: links is aided by
caching and subject to speed limits; see "vhttp.icn" for details.
Remote links are checked by sending an HTTP "HEAD" request.
Unfortunately, some sites respond with "Server Error" or even with
snide remarks like "Because I felt like it". These are reported
as errors and must be inspected manually.
NOTE: if the environment variable USER is set, as it usually is,
then "From: $USER@hostname" is sent as part of each remote inquiry
in order to identify the source. This is standard etiquette for
automated checkers. If USER is not set, but LOGNAME is, then
$LOGNAME is used.
Limitations:
url(...) links within embedded stylesheets are not recognized.
FTP:, MAILTO:, and other link types are not validated.
Files are checked recursively only if named *.htm*.
Proper file permission (for web export) is not checked.
The common error of failing to put a trailing slash on a directory
specification results in a "453 Is A Directory" error message for a
local file or, typically, a "301 Moved Permanently" message for a
remote file.
____________________________________________________________
usage: weblinks [options] file...
-R follow file links recursively
(http links are never followed recursively)
-t trace files as visited
-s report successes as well as problems
-v report tracing and successes, if selected, more verbosely
-i invert output (sort by referencing page, not by status)
-r root
specify starting point for file names beginning with "/"
(e.g. -r /cs/www). This is needed if such references are
to be followed or checked. If a root is specified it
affects all file specifications including those on the
command line.
-h home
specify starting point for file names beginning with "/~".
-p prefix[,prefix...]
prune (don't check) files beginning with given prefix
-b prefix
specify bounds for files scanned: do not scan files
that do not begin with prefix. Default bounds are
directory of last file name. For example,
weblinks /foo/bar /foo/baz
implies "-b /foo/".
If the environment variable WEBLINKS_INIT is set, its whitespace-
separated words are prepended to the explicit command argument list.
____________________________________________________________
Examples (all assuming a web area rooted at /cs/www)
To check one new page:
weblinks -r /cs/www /icon/books.htm
To check a personal hierarchy, with tracing:
setenv WEBLINKS_INIT "-r /cs/www -h /cs/www/people"
weblinks -R -t /~gmt/
To check with pruning:
weblinks -R -t -r /cs/www -p /icon/library /icon/index.htm
Source code |
Program Library Page |
Icon Home Page