Masthead CMC Magazine September 1, 1995 / Page 10


THE LAST LINK

The Untimely Death of Yahoo

or

how the double-whammy of Web architecture and information retrieval will do Yahoo in

by Louis B. Rosenfeld (lou@argus-inc.com)

Things fall apart; the centre cannot hold;
Mere anarchy is loosed upon the world

-- W.B. Yeats, "The Second Coming"

One of the Internet's biggest story of late is the great success of the information resource directory, Yahoo. Only a year or so old, Yahoo receives millions of hits daily, has been spun-off as a commercial firm, and is responsible for the (at least temporary) derailing of two promising doctoral careers.

As another recent refugee from a Ph.D. program, I can appreciate what Jerry Yang and David Filo must be feeling; it's difficult to forge ahead in one's doctoral duties when the Net provides so many "real" opportunities to try out your ideas. I also operate a an Internet directory service, and as I haven't been approached by any venture capitalists as yet, you can consider this column's title and accuse me of sour grapes. And you'd be right!

My own petulance and envy aside, Yahoo is successful because it really does help millions of Web users find information on a wide variety of subjects. Yahoo is truly one of the Best of the Net, as it has filled an enormous void in the areas of browsing and searching Internet-based information. So I hope you'll believe that I am truly sad as I predict the death of Yahoo. It seems inevitable, as Yahoo's organization, its information architecture, will collapse under the weight of the high volume of entries. And, unfortunately, searching won't solve the problem of volume.

Yahoo's Architecture: The Levy is about to Break

Paradoxically, much of Yahoo's current success can be directly attributed to its architecture. Yahoo's architecture provides a fairly consistent and easy to use interface for browsing a hierarchy of roughly 70,000 entries. That's a fairly sizable number, yet users aren't often confronted with frustratingly long lists of entries. And users generally can move up and down through Yahoo's categories and sub-categories without getting tangled up in too many layers of hierarchy. In other words, Yahoo adeptly handles the issue of balancing a hierarchys breadth and depth: Yahoo users don't face especially long menus, nor do they have to pick (or click) their way through many levels of hierarchical categories and sub-categories.

But we know that the Internet has far more than 70,000 distinct resources, and that number is going to increase, perhaps exponentially, for some time to come. How well will Yahoo's hierarchy hold up when it includes 100,000 entries, or 1,000,000?

Let's assume that Yahoo's managers study human-computer interaction literature and decide that no single menu should exceed ten entries. To accommodate 100,000 entries with such a restriction, Yahoo's hierarchy would need five levels of depth (10 to the fifth power). That may not seem that deep, but its not a giant leap ahead to 1,000,000 entries (10 to the sixth power); will users want to navigate through six levels of hierarchy to find what they need? It's certain that many won't have the patience to look further than as few as three levels.

We certainly can play with these numbers. A broader, shallower hierarchy of 15-entry long menus would accommodate over 750,000 entries at five levels, and over 11,000,000 at six. However, we face a slightly different risk here: the lost patience of those users whose eyes won't easily scan a listing as long as fifteen lines.

Information Retrieval Headaches: how much will improved searching help?

I'm convinced that Yahoo will eventually lose the battle of breadth vs. depth, but many would argue that this point is moot due to Yahoo's support of searching. As long as the user can enter a few keywords to search, why would he even bother with navigating the hierarchy? Won't a list of appropriate entries be generated instantly?

Not according to decades of studies done in the field of information retrieval, many of which show that users were lucky to retrieve even 25% of the items relevant to a search based on keywords. I won't try to summarize an entire field's literature here, but I will point one of its tenets, something that we tend to take for granted: language is ambiguous. For example, consider the multiple meanings (and contexts) of the word pitch. Pitch means something different to a roofer than it does to a salesman, a batter, and so on. Pitch can be a verb, a noun, and is the stem of many other words. Due to this ambiguity, information searching is just plain hard, regardless of whether you're dealing with Yahoo's index or the card catalog at your local public library.

Additionally, searching is only as good as the materials that are being searched. Yahoo's entries are self-labeled by those who submit them; in other words, the entries aren't being named by a single individual (or coordinated group) who follows a consistent labeling scheme. This variability is found in Yahoo's cataloging scheme as well, which currently supports many duplicate and overlapping categories and subcategories. I'll demonstrate with a silly example: you and I are in competing canine mouthwash businesses; your company's Web site has been cataloged under Canine Hygiene Products, while mine comes up under Personal Care Products--Dogs. Where should the user look? Will he or she guess to look in both places, much less know that two similar categories even existed in the first place? These types of anomalies are likely to increase as the volume of Yahoo's entries increases.

To the credit of the folks at Yahoo, professional catalogers are now being hired to work on problems like these. Certainly we should witness some performance improvements in the organizational and labeling schemes that Yahoo employs. But such an approach could lead the Yahoo team perilously close to duplicating some of the ill-advised efforts of recent years to classify the Internet. It seems likely that techniques from classical librarianship, which have had only limited success in libraries and other traditional information environments, will scale even less so in the distributed, decentralized and heterogeneous information space that we find on the Web.

Is it all Doom and Gloom?

It's easy to be a critic, especially one who doesn't offer concrete solutions. But my criticisms of Yahoo are based on well-known problems of information retrieval and human-computer interaction, problems for which no one has come up with satisfactory solutions. Indeed, Yahoo is just like any large information system: its nowhere near perfect, and never will be.

That's not to say that Yahoo won't continue to have value in the future. Regardless of how unwieldy it will be, Yahoo may actually serve as the closest thing the Internet ever will have to the Yellow Pages. That's certainly not a bad thing to be, and if this is indeed the case, I'm sure that Yahoo's financiers will be quite happy with the return on their investments. But just as every one of us relies upon the Yellow Pages, we also require many other information resources to do our jobs and live our lives. Perhaps the solution is not to depict Yahoo as the answer to every Internet users information needs, but just one part of a much bigger picture. [CMC TOC]

Louis B. Rosenfeld is President of Argus Associates, Inc. and Administrator for the Clearinghouse for Subject-Oriented Internet Resource Guides.

Copyright © 1995 by Louis B. Rosenfeld. All Rights Reserved.


This Issue / Index / CMC Studies Center / Contact Us