procedure vhttp: validate HTTP: URL
link vhttp
May 15, 2002; Gregg M. Townsend
Requires: Unix, dynamic loading
This file is in the public domain.
vhttp(url) validates a URL (a World Wide Web link) of HTTP: form by sending a request to the specified Web server. It returns a string containing a status code and message. If the URL is not in the proper form, or if it does not specify the HTTP: protocol, vhttp fails. ____________________________________________________________ vhttp(url) makes a TCP connection to the Web server specified by the URL and sends a HEAD request for the specified file. A HEAD request asks the server to check the validity of a request without sending the file itself. The response code from the remote server is returned. This is a line containing a status code followed by a message. Here are some typical responses: 200 OK 200 Document follows 301 Moved Permanently 404 File Not Found See the HTTP protocol spec for more details. If a response cannot be obtained, vhttp() returns one of these invented codes: 551 Connection Failed 558 No Response 559 Empty Response ____________________________________________________________ The request sent to the Web server can be parameterized by setting two global variables. The global variable vhttp_agent is passed to the Web server as the "User-agent:" field of the HEAD request; the default value is "vhttp.icn". The global variable vhttp_from is passed as the "From:" field of the HEAD request, if set; there is no default value. ____________________________________________________________ vhttp() contains deliberate bottlenecks to prevent a naive program from causing annoyance or disruption to Web servers. No remote host is connected more than once a second, and no individual file is actually requested more than once a day. The request rate is limited to one per second by keeping a table of contacted hosts and delaying if necessary so that no host is contacted more than once in any particular wall-clock second. Duplicate requests are prevented by using a very simple cache. The file $HOME/.urlhist is used to record responses, and these responses are reused throughout a single calendar day. When the date changes, the cache is invalidated. These mechanisms are crude, but they are effective good enough to avoid overloading remote Web servers. In particular, a program that uses vhttp() can be run repeatedly with the same data without any effect after the first time on the Web servers referenced. The cache file, of course, can be defeated by deleting or editing. The most likely reason for this would be to retry connections that failed to complete on the first attempt.