February 1997

Root Page of Article: (How) Can Software Agents Become Good Net Citizens?, by Sabine Helmers, Ute Hoffmann, and Jillian Stamos-Kaschke

Keeping Spiders in Check

A growing number of spiders conduct exhaustive searches, traversing the Web's hypertext structure and retrieving information from the remote sites they visit, thus slowing response time for everyone. Besides placing demands on network, a Web bot also places extra demand on servers. The strain placed on the network and hosts is even more increased by a badly behaved Web bot. In November 1995, for example, a search robot hit Thomas Boutell's World Birthday Web and began clicking repeatedly on the hundreds of pages located at the site, pages that were in turn linked to thousands more off-site pages. Boutell's provider eventually had to lock out an entire network of Internet addresses in Norway from where the robot was launched (3).

Martijn Koster's Guidelines for Robots Writers, written in 1993, were meant as a means of addressing the increased load on Web servers by spiders. They call for programmers to design their robots to act more responsibly. The robot exclusion standard, which originated from the WWW Robots Mailing List, offers Web site administrators and content providers a facility to limit what the robot does. People who don't approve of robots can prevent being visited.

The "Guidelines" and the exclusion standard are supposed to have been the first wide-ranging attempt at web robot ethics, reflecting a consensus among spider authors that was operating reasonably well for quite some time. For the exclusion standard to work, a robot must be programmed to look for a "robot.txt" file that would tell it exactly what it could and could not do on a particular site. The standard, however, is not enforced. According to a recent study (4), few site masters take advantage of this mechanism. Such findings highlight that a robot more often than not must use its own judgement to make decisions regarding his behavior.

Following this line of thinking, researchers at the University of Washington conceived of an Internet Softbot. The softbot, a research prototype that had a number of fielded descendants but itself was never fielded, should be able to use the same software tools and utilities available to human computer users on a person's behalf. Provided only with an incomplete model of its environment the Softbot's behavior would be guided by a collection of "softbotic" laws alluding to the "Laws of Robotics" envisioned by Science Fiction writer Isaac Asimov in the 1950s. (5)

The guidelines for Robots Writers, the exclusion standards, and the "softbotic" laws all represent approaches that aim to construct robots with a built-in capacity to follow the rules of Netiquette. Sometimes this approach does not work. Usenet cancelbots are a case in point. Netiquette usually only allows cancels by the original author of a message. In cases of excessive multiple-posting, when hundreds of newsgroups are flooded with the same message, many Usenet users feel that spam cancelling is justified. Thus, cancelbots may appear an useful instrument in administering Usenet. However, several drawbacks surfaced. Cancelbots "gone mad" have frequently become the source of spam themselves instead of helping to get rid of it. Spam cancelers and spammers alike started to (mis)use cancelbots as weapons against each other. Cancelbots are not immune to abuse as can be seen in attempts to censor speech, e.g., by the Church of Scientology. Thus, --cancelbots which were meant to be a technical solution to net abuse threaten to become a plague.

Contents Archive Sponsors Studies Contact