|
A Proposal for Web Idea Management
by Alexei Falaleev
"Easing Them In, ..."*
Our common professional mission is to find
something new, whether it is a technical
invention, music, scientific discovery or
just a joke. The next thing we face after
getting any idea is the question whether
it is really new. That is our common
nightmare. We are never sure definitely
that we checked everything ever printed.
The Web allows the checking by several
touches of the keyboard. In principle. But
not yet. Why?
|
The aim of this article seems to be
extremely ambitious--to propose a common
strategy, which I would like to call
"Nostradamus protocol", to facilitate idea
searching via the Web. However, I do not
think of myself as a great inventor. There
is a tribe of Web ideas so inevitable that
they are rather predictions than ideas at
all. These ideas are beyond the question
whether they will be realized or not. It's
more a question of when and by whom. The
strategy I am going to propose here is
exactly the case.
"Nostradamus protocol" is probably not a
very good name, because many people know
nothing about the person. But, if the only
use of the article would be to make them
acquainted with him, it is worth writing
and reading anyway. I would recommend
a brief
reference. To put it briefly,
Nostradamus was a predictor of the sixteenth
century. Some his verses that looked very
vague in his time, look now very
understandable been applied to historical
events.
|
Nostradamus' verses really help to
illustrate what I am going to say. There
is something special about the information
pattern of the verses. To my mind,
Nostradamus' verses are the first famous
human document written intentionally for
keyword identification. That is a typical
pattern-structured Web document of the
future.
Why? The purpose of the verses was to show
that the author was aware about some
future events without actually telling
about the events! Nostradamus was probably
the first who bumped into the fact that
thousands of complex events can be
identified just by a few usual keywords in
unique connections. Nothing unique can be
found in Nostradamus' verses containing
the words "king", "eye", "wood chip", and
"kill." Connection of these words made
striking sense only after the death of
Henry III of France, who died after
getting chip in his eye at the fight with
a Monmorancy.
The task to name something or somebody is
old and easy enough. That is just a
language in itself. But to choose a
special pattern of usual words to name
something for unknown searcher is a need
of the present Web.
What is the principal difference between
naming and word patterning? Naming is
applicable for subjects on which we are
informed in advance. Every name and every
word in general we learned first as a
pattern of other words or images, or their
meaning was explained some other way. It's
easy to search knowing the name.
Ideas and historical events are some of
which we are not informed in advance, some
waiting to happen, some waiting to be
named. Some unpredictable and
unidentifiable before happening, some
searchable after happening, but searchable
by a word pattern only: no name yet! That
is common between any ideas and any
historical events. That is why the
information structure of Nostradamus
verses is so close to the structure of
modern Web documents.
Every part of every text can be called a
word pattern. What is special about the
pattern of Nostradamus' verses?
My most impressive Web search I did last
summer. About 20 years ago my mother, a
Ph.D. in concrete construction materials,
told me her guess that pyramids maybe are
made of concrete. She never found time to
explore the idea, and this summer came at
last, when I remembered it. I tried
"pyramids" as the search keyword and
received more than 200,000 references. I
tried "concrete" and got 300,000. But
combining "pyramids AND concrete" gave me
a bunch of references about a new rather
popular theory that pyramids were built of
concrete! If Nostradamus was going to
predict the idea, for sure he would use
just these two words. That's his style.
The total pool of information can be
divided into sub spaces of units
containing a single "marker" word--
"pyramid", or "concrete", or any of other
200,000 words not extremely special. Each
of the words corresponds to some sub space
of documents where it is mentioned. These
sub spaces are overlapping, certainly. The
Nostradamus' pattern is an overlapping of
two or more sub spaces that:
- were not overlapping before an event
(idea), and became overlapped with it; and
- can be guessed subjectively by different
people the same way, such as "lighthouse."
"Pyramids" and "concrete" had huge
sub-spaces of documents before the theory,
but the spaces were not overlapping: there
were no documents containing both the
words, save vocabularies. The words are
obvious markers of the idea, so everybody
who invented the idea separately, could
easily guess to search for connection of
the words to check whether somebody told
about the idea before.
A lot of words have various meanings. The
more frequently a word is used, the
greater number of various meanings it
usually has. In consideration of word
patterns, this multiplication of meaning
can be called degeneration. A difference
between traditional indexes, complete
descriptions, and appearing Nostradamus'
patterns is that the indexes are to
characterize something as commonly as
possible, descriptions--as exactly as
possible, but Nostradamus patterns--as
"guessably" as possible. All three these
kinds are needful, but the last one is
something yet waiting to become etiquette,
protocol, and tradition.
"... Softening Them Up, ..."
What are the obstacles preventing
exhaustive and prompt Web pattern
identification of any idea?
Successful searching is the result of joint
efforts of the searcher and the searched,
the result of a clear understanding of
actions they expect from each other.
The Internet itself is a set of common
protocols of these mutual efforts on a
technical level. Now it is the turn for
the same protocols on the human level.
The ideal situation is "I know exactly the
way you search, and you know that I know
that." My usual impression about current
sites is "Sorry, I don't know how you are
going to search me. And, what is more
important, I don't care about that!" The
situation when a person has bumped into an
idea and is guessing what to do next is a
typical one for each of us. The situation
has powerful and easy Web solutions. There
should be a more detailed protocol about
what exactly we are expected to do to let
other people know a new idea via the Web,
and how to know if somebody else found the
idea before. Possible components of the
protocol are the following:
-
The person announcing an idea in the
Web to let the others search the idea (to
tell briefly, the searched) should use
somewhere in the text the most obvious
word patterns identifying the idea. The
landing strip should be wider than the
plane. I mean, if there are several
obvious patterns, all of them should be
used. But from the searcher's point of
view, the plane should be wider than the
landing strip: all obvious patterns should
be used for the search.
-
The patterns should
be checked for degeneration: used in other
meanings and in other applications. If
they are, both the searched and the
searcher may not bother to take them into
account. Not knowing each other they would
know this limitation is something they
both agree about. It will save a lot of
time. Too many references? If so, no one
of them matters!
-
If the searcher did
not find the searched idea in the Web
using the rules, and if the idea can be
announced at all this way, the searcher
consider the idea to be new. But the
person is expected to announce the idea on
the Web the same way for global control if
that's true. The unarguable presence of an
idea in the Web exposed this way to world
wide control is a permanent certification
of its novelty.
-
Last but not least, the
protocol should be generally accepted, but
it can be accepted step by step, as any
other human etiquette.
Items 1-2 imply that word patterns used in
the protocol should be of Nostradamus'
style. That is a reason why I would call
it the Nostradamus protocol. The second
reason is that this term satisfies the
Nostradamus protocol's own requirements:
Altavista counted about 15,000 Web
documents containing the term
"Nostradamus", about 660,000 containing
the word "protocol", but no one containing
them consequently. Several other possible
terms fail the requirements: they are
already used!
Not every idea can be announced and
effectively searched using this protocol.
But there are such ideas, and there are a
lot of them. For every idea everybody is
able to estimate easily whether it fits
the protocol (whether it is searchable
this way). If an idea can not be found in
the Web in a minute using the Nostradamus
protocol, let's use traditional ways. But
in many cases, it can. The point is just
to save time for searching, if possible.
The Web reality is that in many cases you
would need just a few minutes to find
where your idea was already used, if there
was such a protocol. The other Web reality
is that no trace of such a protocol is
accepted yet. I see three main ways to
realize the idea.
-
By Appointment
It would be something like the desk of
lost and found things, like some
remarkable spot at the central square
where you go if you lost your friend in an
unknown town. A lot of word patterns are
degenerated in the total Web: it's too
huge for our limited vocabulary. The
degeneration is spreading inevitably like
orbit trash, but at a rate a thousand
times more. Opposite the Earth's orbit, in
the Web, some sub space could be guarded
from this occasional degeneration. Many
patterns that are not of Nostradamus' type
already in the total Web, could be made
Nostradamus patterns in a Web sub space,
specially intended for idea search. All
that we need to do to create such a spot
of lost and found ideas, is:
A famous site that everybody visits.
Search etiquette, expecting that everybody
announcing an idea, adds to the site
several Nostradamus-type word patterns
identifying the idea, with links to more
detailed information, and tries to not
degenerate the space.
A similar etiquette has already come into
action about announcing sites to worldwide
searching utilities. As is well known,
most of the Web information available in
Altavista, Yahoo, Excite, etc, is not
"found" by these utilities in the Web, but
declared to them by authors. Nobody is
obliged to do that, but everybody does.
That is just an implicit collective
agreement. The same practice is possible
for target idea management.
The term "idea management" is a little
idea itself, as well as a unique word
pattern that is searchable after the idea
of the term appeared. I tried Altavista to
check whether I was first to find it, and
alas, I found that "Idea Management is a
term that was coined ... by Dave
Beckwith". I spent just a minute for the
checking. That is the way it should be for
a huge amount of little and large ideas
identifiable by word patterns.
The first idea banks and idea searching
tools have already appeared on the Web.
They are easily searchable, so I am not
going to describe them here just to show
my competence. As a start point for
surfing, I could recommend
Idea Management.
A site that fits my idea the best among
existing sites is the
Global Ideas Bank.
That is a
place where a lot of ideas are announced,
systemized, and equipped with searching
utilities. However, the point of my
proposal is not just a Web site, but
common agreement how to formulate and
identify ideas by a set of most obvious
word patterns. And, another important
feature, everybody should be aware about
the site.
-
By Hunting
In spite of all the indexes and target
databases, I am personally fascinated by
total Web content search. There is
something challenging in refining a few
needful references from everything. I know
that information available now for total
Web search is just a little bit of
everything at all. However, it is
something most close to it.
Size
The main problem of Web search is Web
trash: good documents without any use for
this particular search but containing the
same keywords as the document in search.
Almost every component of this search
trash may be skinned off by the powerful
junction of searchable word patterns,
Boolean and proximity operators, and
Web-specific ones like splendid
altavista.digital collection: anchor,
title, url, host, link, etc.
But one waste component of usual Web
searching caught is not disposable yet
with the operators. I mean huge Web
documents containing everything. We got
these documents in every search. For
example, what is common between fly and
elephant, heaven and soil, dog and cat?
All of the words can be found now in more
than 500 Web documents,
like
this one.
Size of this document? 1006K.
I wonder how many Web searchers got into
the file since it's creation, and how much
did they find there.
The larger is a document the more chances
we have to get it whatever we are
searching for. That is the real black
holes of Web space: they capture every
unsuspicious searcher! I think we need
another search operator: SIZE, to filter
off too large documents. Maybe it should
look something like SIZE < n, or just < n,
or just &lgt;, the verbal form maybe LARGEOFF
or even HUGEOFF, with default amount about
500 Kb. Never mind how the parameter
should look: my concern is that it should
be.
Near
Another effective searching parameter
would a logical extension of NEAR
operator, with variable distance between
words. Now it is a fixed parameter, 10
words in Digital's Alta Vista. Sometimes
it should be just 3, sometimes 30, and
sometimes 100. It does not matter for a
searching program whether to operate with
fixed or variable parameters. It does
matter a lot for me personally. I suppose
the usual operator near without variable
range should still exist, but as the
default. We should have freedom to choose
the range.
Surface
A real disaster for Web search is any page
with a lot of links to sub pages not
declared separately to searching agencies.
The result is you find a page which
contains no keywords in search at all. You
have no idea, which of the sub links leads
to the right place! We should have an
opportunity to cancel the sub page search.
This first layer search I would name by
SURFACE parameter, but certainly it does
not matter how to name it.
But supporting such a parameter is just
the cheapest way. The most effective one
is to treat every sub link automatically
as a separate searchable unit. That allows
applying the SURFACE parameter to all the
Web pages. In this case, any search will
get exactly the sub page containing the
keywords in question, and no main pages
will trash the catch.
- By Robot
I think that there is a promising and
missing step in evolution from the
automatic Web syntactic search of our time
to semantic search of the future. The step
is the dynamic syntactic Web search. In
other words, automatic Web history
tracking and analyzing on syntactic basis.
What do I mean?
Overlapping of about 200,000 main words in
some multi gigabit information sub space
is something calculable now
(Bruce Schartz). Almost
inevitably, the appearance and development
of a new idea is mirrored in some
information space by growing overlapping
between word sub spaces that were not
overlapped before. The dynamic analysis of
scientific terminology is an obvious idea:
but it has been applied to texts that were
not intended by their authors to the
analysis. Less obvious is the idea of
automatic dynamic analysis as a searching
tool in the hands of people who know that
they search in a space of Nostradamus
patterns specially created for the
analysis. This information sub space would
be an ideal test bed for Web robots
tracking new ideas.
Another kind of Web robot, armed with
thesauri, could versify word patterns
proposed by a human and automatically
check degeneration of automatically
generated patterns. The robot is effective
both for announcing new ideas and
searching the existing ones in Nostradamus
style.
The third kind of Web robots could use
this sub space as an initial trace to
detailed descriptions of ideas, thesauri
enrichment, and semantic analysis.
"... And Finishing Them Off"
Let's suppose I invented a banner for a
home page: "my wife", in which animation
turns "w" to "l" and back, switching to
"my life". Nice. But what to do with it? I
have no wife yet. And no crazy desire to
try selling the idea or to promote it for
free. I can not even present it to a
married friend of mine: there should be a
computer animator to create the banner!
All I have to do with the idea is to
completely forget about it. I am busy.
Everyone of us bump sometimes into ideas
having nothing in common with jobs we are
paid for. Sometimes you say a joke that
you feel millions of people would laugh
at. Sometimes a scene of a movie comes to
your mind, and you know it would be a hit.
Sometimes, probably just once in all your
life, the feeling is urgent. But that's
not reason to switch career to a
professional humorist, or scenarist, or
whoever else. All that we do usually in
such a situation is just saying the idea
to somebody and then saying good bye to
the idea. If there would be a right common
Web spot for the ideas! To send and
forget. And then maybe to see them picked
up in Terminator IV, or at altavista, or
at Pope the Roman's home page. Publicity
is the best payment for the occasional
ideas, the opportunity to show where your
ideas were used. Opportunity to see links
to you from that high.
I am not going to say that every idea
should be shared for free. The question is
out of the subject at all. The subject of
the article is how to let know about an
idea and how to get know about it, no
matter whether it is copyrighted or not.
A problem and advantage of our time is
that almost any final product is made by a
team, and sometimes by a huge one. Whether
we like that or not. Even remaining lonely
symbols of individuality, such as pop
singers, are usually collective products
of music composers, poets, image-makers,
and thousands advisors. With the Web, the
whole world is going to be the creative
team. Everybody should know exactly what
to do with any kind of idea, spending just
minutes for Web verification and
announcement. That is like adjusting the
global idea traffic, the idea management.
Our ideas come to us from nowhere and to
nowhere as a rule they go. I think gifts
addressed to us are not something to be
trashed. Got an idea and do not know what
to do with it? Web it. The question is the
right place to Web and to search.
Altavista! Excite! Locus! HotBot! Yahoo!
Any megaspot owner! Do you hear me?
* Note:
Headings for this article are from Stephen
King's The Library Policeman.
Alexei Falaleev, Ph.D.,
(falaleev@hotmail.com)
is Associate of
the Mortenson Center, University of
Illinois at Urbana-Champaign. His
permanent position is at the Vladivostok
State University of Economics, Russia,
where he is responsible for planning
projects on the university's information
development.
Copyright © 1998 by Alexei Falaleev. All Rights Reserved.
|
|