May 1998

A Proposal for Web Idea Management

by Alexei Falaleev

"Easing Them In, ..."*

Our common professional mission is to find something new, whether it is a technical invention, music, scientific discovery or just a joke. The next thing we face after getting any idea is the question whether it is really new. That is our common nightmare. We are never sure definitely that we checked everything ever printed. The Web allows the checking by several touches of the keyboard. In principle. But not yet. Why? The aim of this article seems to be extremely ambitious--to propose a common strategy, which I would like to call "Nostradamus protocol", to facilitate idea searching via the Web. However, I do not think of myself as a great inventor. There is a tribe of Web ideas so inevitable that they are rather predictions than ideas at all. These ideas are beyond the question whether they will be realized or not. It's more a question of when and by whom. The strategy I am going to propose here is exactly the case.

"Nostradamus protocol" is probably not a very good name, because many people know nothing about the person. But, if the only use of the article would be to make them acquainted with him, it is worth writing and reading anyway. I would recommend a brief reference. To put it briefly, Nostradamus was a predictor of the sixteenth century. Some his verses that looked very vague in his time, look now very understandable been applied to historical events.

Nostradamus' verses really help to illustrate what I am going to say. There is something special about the information pattern of the verses. To my mind, Nostradamus' verses are the first famous human document written intentionally for keyword identification. That is a typical pattern-structured Web document of the future.

Why? The purpose of the verses was to show that the author was aware about some future events without actually telling about the events! Nostradamus was probably the first who bumped into the fact that thousands of complex events can be identified just by a few usual keywords in unique connections. Nothing unique can be found in Nostradamus' verses containing the words "king", "eye", "wood chip", and "kill." Connection of these words made striking sense only after the death of Henry III of France, who died after getting chip in his eye at the fight with a Monmorancy.

The task to name something or somebody is old and easy enough. That is just a language in itself. But to choose a special pattern of usual words to name something for unknown searcher is a need of the present Web.

What is the principal difference between naming and word patterning? Naming is applicable for subjects on which we are informed in advance. Every name and every word in general we learned first as a pattern of other words or images, or their meaning was explained some other way. It's easy to search knowing the name.

Ideas and historical events are some of which we are not informed in advance, some waiting to happen, some waiting to be named. Some unpredictable and unidentifiable before happening, some searchable after happening, but searchable by a word pattern only: no name yet! That is common between any ideas and any historical events. That is why the information structure of Nostradamus verses is so close to the structure of modern Web documents.

Every part of every text can be called a word pattern. What is special about the pattern of Nostradamus' verses?

My most impressive Web search I did last summer. About 20 years ago my mother, a Ph.D. in concrete construction materials, told me her guess that pyramids maybe are made of concrete. She never found time to explore the idea, and this summer came at last, when I remembered it. I tried "pyramids" as the search keyword and received more than 200,000 references. I tried "concrete" and got 300,000. But combining "pyramids AND concrete" gave me a bunch of references about a new rather popular theory that pyramids were built of concrete! If Nostradamus was going to predict the idea, for sure he would use just these two words. That's his style.

The total pool of information can be divided into sub spaces of units containing a single "marker" word-- "pyramid", or "concrete", or any of other 200,000 words not extremely special. Each of the words corresponds to some sub space of documents where it is mentioned. These sub spaces are overlapping, certainly. The Nostradamus' pattern is an overlapping of two or more sub spaces that:

  1. were not overlapping before an event (idea), and became overlapped with it; and

  2. can be guessed subjectively by different people the same way, such as "lighthouse."

"Pyramids" and "concrete" had huge sub-spaces of documents before the theory, but the spaces were not overlapping: there were no documents containing both the words, save vocabularies. The words are obvious markers of the idea, so everybody who invented the idea separately, could easily guess to search for connection of the words to check whether somebody told about the idea before.

A lot of words have various meanings. The more frequently a word is used, the greater number of various meanings it usually has. In consideration of word patterns, this multiplication of meaning can be called degeneration. A difference between traditional indexes, complete descriptions, and appearing Nostradamus' patterns is that the indexes are to characterize something as commonly as possible, descriptions--as exactly as possible, but Nostradamus patterns--as "guessably" as possible. All three these kinds are needful, but the last one is something yet waiting to become etiquette, protocol, and tradition.

"... Softening Them Up, ..."

What are the obstacles preventing exhaustive and prompt Web pattern identification of any idea?

Successful searching is the result of joint efforts of the searcher and the searched, the result of a clear understanding of actions they expect from each other. The Internet itself is a set of common protocols of these mutual efforts on a technical level. Now it is the turn for the same protocols on the human level.

The ideal situation is "I know exactly the way you search, and you know that I know that." My usual impression about current sites is "Sorry, I don't know how you are going to search me. And, what is more important, I don't care about that!" The situation when a person has bumped into an idea and is guessing what to do next is a typical one for each of us. The situation has powerful and easy Web solutions. There should be a more detailed protocol about what exactly we are expected to do to let other people know a new idea via the Web, and how to know if somebody else found the idea before. Possible components of the protocol are the following:

  1. The person announcing an idea in the Web to let the others search the idea (to tell briefly, the searched) should use somewhere in the text the most obvious word patterns identifying the idea. The landing strip should be wider than the plane. I mean, if there are several obvious patterns, all of them should be used. But from the searcher's point of view, the plane should be wider than the landing strip: all obvious patterns should be used for the search.

  2. The patterns should be checked for degeneration: used in other meanings and in other applications. If they are, both the searched and the searcher may not bother to take them into account. Not knowing each other they would know this limitation is something they both agree about. It will save a lot of time. Too many references? If so, no one of them matters!

  3. If the searcher did not find the searched idea in the Web using the rules, and if the idea can be announced at all this way, the searcher consider the idea to be new. But the person is expected to announce the idea on the Web the same way for global control if that's true. The unarguable presence of an idea in the Web exposed this way to world wide control is a permanent certification of its novelty.

  4. Last but not least, the protocol should be generally accepted, but it can be accepted step by step, as any other human etiquette.

Items 1-2 imply that word patterns used in the protocol should be of Nostradamus' style. That is a reason why I would call it the Nostradamus protocol. The second reason is that this term satisfies the Nostradamus protocol's own requirements: Altavista counted about 15,000 Web documents containing the term "Nostradamus", about 660,000 containing the word "protocol", but no one containing them consequently. Several other possible terms fail the requirements: they are already used!

Not every idea can be announced and effectively searched using this protocol. But there are such ideas, and there are a lot of them. For every idea everybody is able to estimate easily whether it fits the protocol (whether it is searchable this way). If an idea can not be found in the Web in a minute using the Nostradamus protocol, let's use traditional ways. But in many cases, it can. The point is just to save time for searching, if possible.

The Web reality is that in many cases you would need just a few minutes to find where your idea was already used, if there was such a protocol. The other Web reality is that no trace of such a protocol is accepted yet. I see three main ways to realize the idea.

  1. By Appointment

    It would be something like the desk of lost and found things, like some remarkable spot at the central square where you go if you lost your friend in an unknown town. A lot of word patterns are degenerated in the total Web: it's too huge for our limited vocabulary. The degeneration is spreading inevitably like orbit trash, but at a rate a thousand times more. Opposite the Earth's orbit, in the Web, some sub space could be guarded from this occasional degeneration. Many patterns that are not of Nostradamus' type already in the total Web, could be made Nostradamus patterns in a Web sub space, specially intended for idea search. All that we need to do to create such a spot of lost and found ideas, is: A famous site that everybody visits. Search etiquette, expecting that everybody announcing an idea, adds to the site several Nostradamus-type word patterns identifying the idea, with links to more detailed information, and tries to not degenerate the space.

    A similar etiquette has already come into action about announcing sites to worldwide searching utilities. As is well known, most of the Web information available in Altavista, Yahoo, Excite, etc, is not "found" by these utilities in the Web, but declared to them by authors. Nobody is obliged to do that, but everybody does. That is just an implicit collective agreement. The same practice is possible for target idea management.

    The term "idea management" is a little idea itself, as well as a unique word pattern that is searchable after the idea of the term appeared. I tried Altavista to check whether I was first to find it, and alas, I found that "Idea Management is a term that was coined ... by Dave Beckwith". I spent just a minute for the checking. That is the way it should be for a huge amount of little and large ideas identifiable by word patterns.

    The first idea banks and idea searching tools have already appeared on the Web. They are easily searchable, so I am not going to describe them here just to show my competence. As a start point for surfing, I could recommend Idea Management.

    A site that fits my idea the best among existing sites is the Global Ideas Bank. That is a place where a lot of ideas are announced, systemized, and equipped with searching utilities. However, the point of my proposal is not just a Web site, but common agreement how to formulate and identify ideas by a set of most obvious word patterns. And, another important feature, everybody should be aware about the site.

  2. By Hunting

    In spite of all the indexes and target databases, I am personally fascinated by total Web content search. There is something challenging in refining a few needful references from everything. I know that information available now for total Web search is just a little bit of everything at all. However, it is something most close to it.


    The main problem of Web search is Web trash: good documents without any use for this particular search but containing the same keywords as the document in search. Almost every component of this search trash may be skinned off by the powerful junction of searchable word patterns, Boolean and proximity operators, and Web-specific ones like splendid collection: anchor, title, url, host, link, etc.

    But one waste component of usual Web searching caught is not disposable yet with the operators. I mean huge Web documents containing everything. We got these documents in every search. For example, what is common between fly and elephant, heaven and soil, dog and cat? All of the words can be found now in more than 500 Web documents, like this one. Size of this document? 1006K. I wonder how many Web searchers got into the file since it's creation, and how much did they find there.

    The larger is a document the more chances we have to get it whatever we are searching for. That is the real black holes of Web space: they capture every unsuspicious searcher! I think we need another search operator: SIZE, to filter off too large documents. Maybe it should look something like SIZE < n, or just < n, or just &lgt;, the verbal form maybe LARGEOFF or even HUGEOFF, with default amount about 500 Kb. Never mind how the parameter should look: my concern is that it should be.


    Another effective searching parameter would a logical extension of NEAR operator, with variable distance between words. Now it is a fixed parameter, 10 words in Digital's Alta Vista. Sometimes it should be just 3, sometimes 30, and sometimes 100. It does not matter for a searching program whether to operate with fixed or variable parameters. It does matter a lot for me personally. I suppose the usual operator near without variable range should still exist, but as the default. We should have freedom to choose the range.


    A real disaster for Web search is any page with a lot of links to sub pages not declared separately to searching agencies. The result is you find a page which contains no keywords in search at all. You have no idea, which of the sub links leads to the right place! We should have an opportunity to cancel the sub page search. This first layer search I would name by SURFACE parameter, but certainly it does not matter how to name it.

    But supporting such a parameter is just the cheapest way. The most effective one is to treat every sub link automatically as a separate searchable unit. That allows applying the SURFACE parameter to all the Web pages. In this case, any search will get exactly the sub page containing the keywords in question, and no main pages will trash the catch.

  3. By Robot

    I think that there is a promising and missing step in evolution from the automatic Web syntactic search of our time to semantic search of the future. The step is the dynamic syntactic Web search. In other words, automatic Web history tracking and analyzing on syntactic basis. What do I mean?

    Overlapping of about 200,000 main words in some multi gigabit information sub space is something calculable now (Bruce Schartz). Almost inevitably, the appearance and development of a new idea is mirrored in some information space by growing overlapping between word sub spaces that were not overlapped before. The dynamic analysis of scientific terminology is an obvious idea: but it has been applied to texts that were not intended by their authors to the analysis. Less obvious is the idea of automatic dynamic analysis as a searching tool in the hands of people who know that they search in a space of Nostradamus patterns specially created for the analysis. This information sub space would be an ideal test bed for Web robots tracking new ideas.

    Another kind of Web robot, armed with thesauri, could versify word patterns proposed by a human and automatically check degeneration of automatically generated patterns. The robot is effective both for announcing new ideas and searching the existing ones in Nostradamus style.

    The third kind of Web robots could use this sub space as an initial trace to detailed descriptions of ideas, thesauri enrichment, and semantic analysis.

"... And Finishing Them Off"

Let's suppose I invented a banner for a home page: "my wife", in which animation turns "w" to "l" and back, switching to "my life". Nice. But what to do with it? I have no wife yet. And no crazy desire to try selling the idea or to promote it for free. I can not even present it to a married friend of mine: there should be a computer animator to create the banner! All I have to do with the idea is to completely forget about it. I am busy.

Everyone of us bump sometimes into ideas having nothing in common with jobs we are paid for. Sometimes you say a joke that you feel millions of people would laugh at. Sometimes a scene of a movie comes to your mind, and you know it would be a hit. Sometimes, probably just once in all your life, the feeling is urgent. But that's not reason to switch career to a professional humorist, or scenarist, or whoever else. All that we do usually in such a situation is just saying the idea to somebody and then saying good bye to the idea. If there would be a right common Web spot for the ideas! To send and forget. And then maybe to see them picked up in Terminator IV, or at altavista, or at Pope the Roman's home page. Publicity is the best payment for the occasional ideas, the opportunity to show where your ideas were used. Opportunity to see links to you from that high.

I am not going to say that every idea should be shared for free. The question is out of the subject at all. The subject of the article is how to let know about an idea and how to get know about it, no matter whether it is copyrighted or not.

A problem and advantage of our time is that almost any final product is made by a team, and sometimes by a huge one. Whether we like that or not. Even remaining lonely symbols of individuality, such as pop singers, are usually collective products of music composers, poets, image-makers, and thousands advisors. With the Web, the whole world is going to be the creative team. Everybody should know exactly what to do with any kind of idea, spending just minutes for Web verification and announcement. That is like adjusting the global idea traffic, the idea management.

Our ideas come to us from nowhere and to nowhere as a rule they go. I think gifts addressed to us are not something to be trashed. Got an idea and do not know what to do with it? Web it. The question is the right place to Web and to search. Altavista! Excite! Locus! HotBot! Yahoo! Any megaspot owner! Do you hear me?

* Note: Headings for this article are from Stephen King's The Library Policeman.

Alexei Falaleev, Ph.D., ( is Associate of the Mortenson Center, University of Illinois at Urbana-Champaign. His permanent position is at the Vladivostok State University of Economics, Russia, where he is responsible for planning projects on the university's information development.

Copyright © 1998 by Alexei Falaleev. All Rights Reserved.

Contents Archive Sponsors Studies Contact