vhttp.icn: Procedure for validating an HTTP URL

procedure vhttp:           validate HTTP: URL

link vhttp
May 15, 2002; Gregg M. Townsend
Requires: Unix, dynamic loading
This file is in the public domain.

vhttp(url) validates a URL (a World Wide Web link) of HTTP: form
by sending a request to the specified Web server.  It returns a
string containing a status code and message.  If the URL is not
in the proper form, or if it does not specify the HTTP: protocol,
vhttp fails.
____________________________________________________________

vhttp(url) makes a TCP connection to the Web server specified by the
URL and sends a HEAD request for the specified file.  A HEAD request
asks the server to check the validity of a request without sending
the file itself.

The response code from the remote server is returned.  This is
a line containing a status code followed by a message.  Here are
some typical responses:

        200 OK
        200 Document follows
        301 Moved Permanently
        404 File Not Found

See the HTTP protocol spec for more details.  If a response cannot
be obtained, vhttp() returns one of these invented codes:

        551 Connection Failed
        558 No Response
        559 Empty Response
____________________________________________________________

The request sent to the Web server can be parameterized by setting
two global variables.

The global variable vhttp_agent is passed to the Web server as the
"User-agent:" field of the HEAD request; the default value is
"vhttp.icn".

The global variable vhttp_from is passed as the "From:" field of the
HEAD request, if set; there is no default value.
____________________________________________________________

vhttp() contains deliberate bottlenecks to prevent a naive program
from causing annoyance or disruption to Web servers.  No remote
host is connected more than once a second, and no individual file
is actually requested more than once a day.

The request rate is limited to one per second by keeping a table
of contacted hosts and delaying if necessary so that no host is
contacted more than once in any particular wall-clock second.

Duplicate requests are prevented by using a very simple cache.
The file $HOME/.urlhist is used to record responses, and these
responses are reused throughout a single calendar day.  When the
date changes, the cache is invalidated.

These mechanisms are crude, but they are effective good enough to
avoid overloading remote Web servers.  In particular, a program
that uses vhttp() can be run repeatedly with the same data without
any effect after the first time on the Web servers referenced.

The cache file, of course, can be defeated by deleting or editing.
The most likely reason for this would be to retry connections that
failed to complete on the first attempt.

Source code | Program Library Page | Icon Home Page