Computer-Mediated Communication Magazine / Volume 1, Number 6 / October 1, 1994 / Page 10

Challenges for Web Information Providers

by John December (john@december.com)

Continued from page 9 / Link to article's front page

A Case study: Tracking Internet Information Sources

Through my work in providing Internet-based information about the Internet and Computer-Mediated Communication (CMC), I've gained insights into issues facing information providers. My experience in tracking Internet information includes developing a list of information about information. In this list, I attempt to organize and present information sources describing the Internet and computer-mediated communication technologies, applications, culture, discussion forums, and bibliographies. In this section, I describe my experience and discuss lessons I've learned.

Background

In May 1992, I began an independent study project (as part of doctoral work at Rensselaer Polytechnic Institute) investigating the Internet and how it can be used for communication. As part of this project, I located information sources about the Internet. I listed these resources and posted the result to alt.bbs.internet (the only Usenet newsgroup at that time with the word `internet' in it). I received some comments and feedback, and I added items to the list as a result of suggestions and further searches of the network. I tried to organize the list so that it would be easy to read, listing Internet descriptions, information services, electronic publications, societies and organizations, newsgroups, and a bibliography. After further revision, I placed the list on my university's FTP site, and posted an announcement of its availability and updates to alt.internet.services (a newsgroup formed after alt.bbs.internet people grew tired of having non-Internet BBS-related items posted to their group). Over the next year, I continued to gather information, drawing on items to include from mailing lists and my own use of Archie, FTP, Gopher/Veronica, WAIS, and the Web.

The reason for my approach in developing a list of information about information rather than original sources was that I found many useful documents describing the Net that were available over the Net. Rather than duplicating these efforts, my goal was to develop a list summarizing where I could obtain the further information sources. I could then use my list to help people become familiar with the Internet, or as a tool to define areas to examine in the field of CMC. The process I used to develop this information has evolved over the years, and has contributed to my skills and ideas about information discovery and selection, presentation formats, usability and design issues, information value and quality, and the context in which I should present my list to others. My development process has included gathering information, presenting it in a variety of formats, improving usability and content, and presenting it in a context where it would elicit more reactions and involvement with others in my field of specialization and study.

Gathering information: Discovery and Selection

Throughout my work with Internet information, I've noticed a similar pattern for information space development and use. File transfer protocol, telnet, Gopher, and the Web all created new information spaces, and the ways these spaces became populated with information were similar. The pattern has been:

Developers introduced an information presentation protocol or system.
Users contributed information to the resulting information space, leading to
- Information space saturation--a plethora of information servers and an abundance of content. This abundance grows to such a degree that the space can't be encountered without information layering or filtering by way of handcrafted indexes or other guides to the spaces.
- Information space pollution--redundant, erroneous, or poorly maintained information becomes replicated throughout the space, obscuring other information.
Developers created tools to automatically traverse the space and glean information about resources. The result of this automated gleaning is a database which can be queried through a keyword or other indexing scheme.
With greater visibility of the available resources, redundancy decreased and specialization increased. Specialized information servers, often under the guidance of experts in the subject area of the information, created new levels and standards for quality. Often, lists or indexes of information servers also contribute greatly to this process (for example, the well-known Gopher Jewels showcases specialized Gophers, discouraging duplication and encouraging specialization).

The above pattern has occured with FTP (Archie as the automated indexer), Gopher (Veronica), and the Web (Spiders).

By observing and making use of this information space life cycle, I have tried to locate the most up-to-date and authoritative source for all the information I present. For example, during the summer and fall of 1992, I made use of Archie to locate directories describing the Internet. So while, in release 1.0 of my list (23 May 92), I had three entries for descriptions of the Internet:

o INTERNET DESCRIPTIONS ANONYMOUS FTP HOST  FILE OR DIRECTORY/
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
Zen & Art of Internet   ftp.cs.widener.edu  pub/zen/
NWNet Internet Guide    ftphost.nwnet.net   nic/nwnet/user-guide/
Hitchhikers Guide       ftp.nisc.sri.com    rfc/rfc1118.txt

I later was able to add more (from release 1.50, 01 Aug 92):

o INTERNET DESCRIPTIONS ANONYMOUS FTP HOST  FILE OR DIRECTORY/
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
New User's Questions    ftp.nisc.sri.com    fyi/fyi4.txt
Hitchhikers Guide       ftp.nisc.sri.com    rfc/rfc1118.txt
Gold in Networks!       ftp.nisc.sri.com    rfc/rfc1290.txt
Zen & Art of Internet   ftp.cs.widener.edu  pub/zen/
Zen ASCII version       csn.org             pub/net/zen/
Guide Internet/Bitnet   hydra.uwo.ca        libsoft/guide1.txt
NSF Resource Guide      nnsc.nsf.net        resource-guide/
NWNet Internet Guide    ftphost.nwnet.net   nic/nwnet/user-guide/
SURANet Internet Guide  ftp.sura.net        pub/nic/infoguide.*.txt
NYSERNet Internet Guide nysernet.org        pub/guides/Guide.*.text
CERFNet Guide           nic.cerf.net        cerfnet/cerfnet_guide/
DDN New User Guide      nic.ddn.mil         netinfo/nug.doc
AARNet Guide            aarnet.edu.au       pub/resource-guide/

Using Archie, coupled with a growing awareness of the duplication of resources in FTP space, I searched for the "definitive" editions and versions of each document. I eventually identified major FTP repositories for Internet information which offered well-maintained collections. As these sites changed and evolved, I added additional pointers to my list. Gradually, I began to see more redundancy at FTP sites--many administrators would copy an entire set of documents to their site. As these documents evolved into later additions, many outdated copies would remain online. By monitoring newsgroups, I gained information about new information as well as updates to existing documents. Where possible, I focused on collecting links to well-maintained FTP sites, such as those at Network Information Centers (NICs).

After discovering a resource, I evaluated it for possible inclusion in my list. Before the development of information space searching tools like Veronica and Web Spiders, I had to rely on newsgroups and mailing lists to discover information sources. After the development of space searching tools, I could be more selective about which sources to include because I knew the space searching tool itself was available for users to find sources in the space. I used Veronica to glean Gopherspace, and later spiders to search the Web. (The World Wide Web Worm, the first widely used spider, was not announced until available in March 1994.)

After searching tools were introduced for each space, I knew a user should be able to locate sources based on any given keyword. This fact lead me to redefine the purpose of my list. For example, one section in my list included electronic journals, services and publications:

o JOURNAL/SERVICE  Subscribe with email to     Body of letter
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
Comserve           comserve@vm.ecs.rpi.edu     Send Comserve Helpfile
EJC/REC            comserve@vm.ecs.rpi.edu     Directory EJCREC
EJournal           listserv@albnyvm1.bitnet    subscribe ejrnl YourName
Netweaver          comserve@vm.ecs.rpi.edu     Send Netweave Winter91
RFCs               rfc-info@isi.edu            help: ways_to_get_rfcs

A user could create such a list by keyword searches of a database of mailing lists--but how would the user know which keywords to use? Moreover, the process itself of locating these addresses and resources, if repeated, would be laborious. Thus, I began to realize that another aspect of my list's value was collecting semantically related specialist information that could not be easily generated by using an information space searching tool.

The information space life cycle also caused me to reevaluate the value of my list in other ways. Early in an information space's life cycle, when just a few servers exist, a handcrafted index into the information in the space isn't really all that necessary, as users could, in a relatively short period of time, become familiar with where resources are located. Later, as the space fills with information, a list becomes more valuable--as a reminder of where the major or definitive information sources are. When the information space fills to the point where space searching tools are developed and used widely, indexing instances of resources and documents in that space becomes less necessary.

However, as the information space matures, space saturation and pollution start to set in. The value of results from space tool searches turn up many duplicate or out-of-date entries, so that a handcrafted index that carefully lists the most authoritative collections or updated editions becomes more important. Finding these accurate collections became my goal as each information space matured. The table below shows the changing contents of my list at representative release dates (there were incremental releases between the ones shown here).

             Number of entries in Information Sources List

                         FTP EMAIL USENET TELNET GOPHER HTTP PAPER
Release 1.00, 23 May 92   20     5    7     0       0    0     0 
Release 1.50, 01 Aug 92   75    12   17     0       0    0    14
Release 2.00, 19 Jan 93  120    21   27     0       0    0    21
Release 2.50, 10 May 93  188    41   31     0       0    0    25
Release 3.00, 03 Nov 93  303    85   36    20      41   13    44
Release 3.14, 01 Dec 93  317   107   36    23      47   40    48
Release 3.20, 22 Jan 94  340   148   37    38      60  101    49
Release 3.25, 11 Feb 94  319   156   37    32     180  649    64
Release 3.62, 21 Aug 94  363   209   37    42     191  764    67

Release 3.00 was the first in HTML (and other) formats, and the release in which I first listed resources in Gopher, telnet, and HTTP. Release 3.62 marked the start of a major shift in my efforts toward consolidating references in my list. Note the slowed expansion of FTP, email, and Gopher entries in the later releases.

I see a strong trend now toward specialized, Web-based collections of information that are collaboratively maintained by experts in the field. The Web offers more expressive possibilities than Gopher, a more uniform interface than telnet sessions, and the capability to integrate information from a variety of protocols.

Most importantly, the Web, because it is based on hypertext, encourages linking to specialized information rather than reinventing or duplicating it. Thus, my goal now is to locate higher-level, well-maintained collections in my area of interest. At the same time, I try to keep aware that many of the users of my list don't use the Web for information retrieval and try to list non-Web sources of information as well. However, I see the best resources gradually moving to the Web, and the best collections integrating many resources in multiple protocols offered through Web pages. Therefore, I see an inevitable shift toward a greater proportion of Web-based information sources in my list.

Continued on page 11

This Issue / Index