Far from being some kind of arcane wizard, the webmaster on a World Wide Web site is really just a specialized sort of system administrator. Aside from answering random email inquiries to webmaster@xyz.dom, the duties include editorial oversight, general administration, log analysis, and security monitoring.

Editorial Oversight

The webmaster is responsible for ensuring the overall quality of pages on the site. Badly crafted pages reflect poorly on the site, so some care should be exercised to ensure that links work, etc. The error log (described below) is a valuable aid in detecting broken links and other HTML problems.

A really dedicated webmaster might also look over the site's pages for clarity, ease of navigation, restraint in the use of large images, consistent behavior across a variety of browsers, etc. That is, the sorts of characteristics recommended by any good book on HTML programming.

Offensive (libelous, obscene, etc.) pages may incur legal liability. Consequently, the webmaster should be familiar with the general content of the pages provided. Unfortunately, there is a Catch-22 to the situation. By censoring some pages, the webmaster is implicitly saying that the other pages are OK. So, if you censor anything, you are at increased risk from any remaining offensive pages.

If the webmaster creates and/or edits all the pages, this may not be much of an issue. On large sites, with many individually-crafted pages, there may be cause for some concern and care.

General Administration

Web servers don't require a lot of administration, except for keeping log files under control. Most server daemons generate a standard log record, ranging from 75 to 90 bytes per HTTP request received.

This doesn't add up to much if the site is small and quiet. My site received about 60,000 requests in August 1995, generating about five megabytes of log information. This isn't much of a problem, given reasonable amounts of disk space. If a site is large, however, the logs could mount up very rapidly.

A T1 link could conceivably bring in over 100 HTTP requests per second. This translates to more than 25 GB of log data per month! So, the webmaster clearly has reason to be alert.

Log Analysis

Along with making sure that the site doesn't drown in log messages, the webmaster should spend some time analyzing the data being collected. The CERN HTTP daemon generates two log files. One contains a record for each HTTP request received. The other records any request that caused an error.

The request log files (located in the log directory) contain only three pieces of data that are likely to be of interest to the typical administrator:

By analyzing this data, the webmaster can determine quite a lot about the site's usage. My own scripts list:

Looking at these reports, I can tell which pages are most popular, which links aren't being traversed, etc. The binned data tells me that most of my users are coming in during business hours.

Although there are freeware packages available to do analysis and reporting, a simple awk or perl script may be more flexible in the hands of an interested and competent webmaster. I certainly find that I tweak my reporting scripts as new data emerges and my interests change.

The error log files aren't very interesting on a statistical basis, but they provide useful indications of bad HTML links, etc. I think the most reasonable way to start using the error log data is to drop into a text editor and massage it a bit.

Bad link information, for instance, is prefaced by the text string " referer: ". I trim off everything up to and including this string, then sort the remaining text, discarding any duplicate lines. This chops away well over 90% of the file, leaving me with a workable, sorted list of suspect links.

Security Monitoring

Simple, static HTML pages aren't a very good target for crackers. Set things up according to the server guidelines and you shouldn't get into much trouble. The big gotcha, unfortunately, is that folks really want to run Common Gateway Interface (CGI) programs. And, since these can have bugs and security holes, your server may acquire security holes as well.

My advice is therefore to be very cautious in adding CGI programs to your server. Look carefully for bugs, than make sure that the programs have carefully limited capabilities. Finally, consider downloading the server's files from a protected machine. This way, even if the server gets trashed, the next download will repair the damage.

Suggested Reading

CERN distributes a User Guide along with their httpd distribution. If you choose to use their server, be sure to print out and peruse this document.

Lincoln Stein's "How to Set Up and Maintain a World Wide Web Site" (Addison-Wesley, 1995, ISBN 0-201-63389-2) is a wonderful book. If you're even thinking about becoming a webmaster, run out and get this book!