The Internet FTP archives contain many interesting repositories of text. This month, I will touch upon several of these. Readers are invited, as always, to send email about archives I've overlooked.
Michael S. Hart firstname.lastname@example.org is the Director of Project Gutenberg. For the last two decades, he has been leading an effort to build an electronic library of essential documents. Some of these started out in public domain electronic form. Most, however, fell into the public domain with the passage of time and were then free to be converted, via scanners and OCR, to "etexts" Borrowing liberally from the project's blurb.gut and NEWUSER.GUT files, here is a brief synopsis of the project:
Project Gutenberg has been releasing Plain Vanilla ASCII Etexts on the Internet and its previous incarnations since about 1971. Currently  four books per month are scheduled for release, which doubles to eight books per month in 1994 and sixteen in 1995, etc.
Our goal is to provide a collection of 10,000 of the most used books by the year 2001, and to reduce the effective costs to the user to a price of approximately one cent per book, plus the cost of media and of shipping and handling. Thus we hope the entire cost of libraries of this nature will be about $100 plus the price of the disks and CDROMS and mailing.
The project is well on its way. The 1991-1993 directories contain, respectively, about 14, 30, and 40 MB of material. The project's scope and influence are growing. As new users arrive and (sometimes) turn into volunteers, it will grow ever more rapidly. FTP to ftp://mrcnext.cso.uiuc.edu/gutnberg.doc/ and crawl around the etext directory.
Barry Shein email@example.com is a very busy guy. When he's not acting as a Technical Editor for SunExpert Magazine, sitting on the boards of SUG and USENIX, or running The World (a public-access UNIX system), he promotes the idea of electronically accessible text archives.
To check out the results of Barry's efforts, FTP to ftp://ftp.std.com/obi/ There are about two hundred top-level directories, on a wide variety of topics. Check it out!
While you're at it, check out ftp://nctuccca.edu.tw/documents/. Nctuccca is a huge (14 GB) archive of text files, etc. It is provided by the Campus Computer Communication Association of the National Chiao Tung University in Taiwan. Unless you're located in Asia, I suggest that you use this site to find out about interesting items, then FTP them from the original (mirrored) sites.
Universities and research laboratories create large numbers of technical reports. Unfortunately, most of these never reach a wide audience. The difficulty of locating the reports, whether in printed or electronic form, is simply too much for most casual inquirers.
Enter Vincent Cate firstname.lastname@example.org and Alex, with a solution. Alex is a user-mode daemon which allows UNIX systems to "mount" the Internet as an NFS file tree. As a demonstration project, Vincent created a database of computer science technical reports.
The information is somewhat out of date (April, 1992), but still quite interesting. And, because universities don't tend to disappear, most of the information should still be valid. Vincent has also used Alex to index a few dozen other topics, from audio to weather.
To check out Alex, FTP to ftp://alex.sp.cs.cmu.edu/ and look around. The computer science technical reports are in cs-techreports. The links to miscellaneous topics are kept in links.
There are numerous archives of Internet- and USENET-related text: far too many to summarize here. I can give you some useful starting points, however. For answers to USENET Frequently Asked Questions (FAQs) and related files, try ftp://rtfm.mit.edu/
PSI's FTP archive, ftp://ftp.psi.com/, contains a wealth of Internet memoranda and related text. The Internet Experiment Notes (IENs) are kept in ien. The Internet Engineering Task Force (IETF) reports are kept in ietf. Requests For Comments (RFCs) are kept in rfc. Et Cetera.
Dave Lampson email@example.com maintains a Classical Music Information Archive in ftp://cs.uwp.edu/pub/music/ This includes a CD Buying Guide, with extensive lists of recommended CDs, information on manufacturers and distributors, mail order sources, and publications. There is also a Basic Repertoire List, containing information on music considered to be part of the basic repertoire, organized by musical period and composer. Finally, the Timeline file graphically depicts the life spans of 80+ composers in a timeline format.
Denis Howe firstname.lastname@example.org maintains a Free On-line Dictionary of Computing in http://wombat.doc.ic.ac.uk/ The dictionary covers programming languages, architectures, domain theory, mathematics, networking, in fact anything to do with computing. Eric Raymond email@example.com/ maintains the essential, if somewhat more whimsical "Jargon File" It is available from ftp://prep.ai.mit.edu/pub/gnu
Ed Krol's book, "The Whole Internet: User's Guide & Catalog" (O'Reilly, 1992, ISBN 1-56592-025-2), is a great jumping-off point for finding oddball Internet resources. It is also a useful guide to using the Internet, in general.