Analysis is the process of gathering and comparing information about the web and its operation and use in order to improve the web's overall quality and to identify problem areas.
A web analyst checks to make sure the web works:
|Point||Evaluate if the web...|
|A||Attempts to reach an audience that has and will use Web access|
|B||Contributes new information (accomplishes goals that haven't already been done)|
|C||Is self-consistent (its purpose matches its objectives and specifications)|
|D||Is correct (the domain information it presents is accurate, up-to-date, and complete)|
|E||Is accessed in a balanced manner, both in terms of its own files and in terms of outside links into it|
|F||Is accomplishing objectives that meet the needs of the users|
An analyst weighs alternatives and gathers information to help with the other processes of web development, including planning, design, implementation, promotion, innovation.
If you have just planned a web, a big question that should be in your mind is, "Will the web accomplish its purpose?" Even when a web already is deployed and operating, you frequently should investigate whether the web is accomplishing its planned objectives. The web analysis techniques presented in this article are intended to help you check web elements in a planned or operating web. This analysis process covers the technical validation of a web's HTML implementation as well as analysis of the web's planned or existing content and design. This process also touches on usability and style issues. Because of the dynamic information environment in which a web operates, these ongoing efforts to evaluate web quality and usability may be the key to increasing the effectiveness of an organization's Web communication.
The figure shows the key information needs of a web analyst for all the web's six elements: purpose and objective statements, audience and domain information, and specification and presentation. The overall goals of a web analyst are
The web analyst thus acts as a reviewer, evaluator, and auditor for the web-development process. When practical, therefore, the web analyst should be as independent as possible from the duties of web implementation, design, and planning.
Based on the characteristics and qualities of the Web, web analysis should pay close attention to evaluating how the web is consistent with the following principles:
In summary, a web analyst is concerned with principles for the technical and rhetorical integrity of a web. The goal is to create a web that works with the characteristics and qualities of networked hypermedia to best accomplish the web's purpose for its audience.
A web analyst can evaluate many of the web's technical and rhetorical aspects by analyzing the web's elements (audience information, purpose and objective statements, domain information, web specification, and web presentation) and performance (information about how users have used or are expected to use the web). This information analysis process also involves gathering information about other competitor webs that may be accomplishing a similar purpose or reaching a similar audience. When performed with the other people involved in web development processes, web information analysis serves as a check of the web's overall quality and effectiveness. Web information analysis seeks to uncover the answers to the following general questions:
Although a definitive answer to these questions might be impossible to obtain at all times, web analysis can serve as a check on the other development processes. This section looks at information analysis checkpoints that can be examined during a web's planning or after it is implemented. This analysis process involves gathering information about a web's elements and comparing it to feedback from users and to server statistics.
The figure at the right shows an overview of information useful in analysis. In the figure, the web's elements are in rectangles, and supporting or derived information is in ovals. Key checkpoints for analysis are shown in small circles, labeled A through F. At each checkpoint, the web analyst compares information about the elements or information derived from the web elements to see whether the web is working or will work effectively.
The information about the web elements and derived information varies in completeness depending on how far the developers are into actually implementing the web. A web analyst can obtain information about the web elements from the results of the planning, design, implementation, or development process. If the developers have just started the planning process, web analysts can analyze the checkpoints for which they have information. A web analyst can obtain the derived information through examining web statistics. Ideally, a web analyst will be able to observe representatives from the intended audience as they use the web. If web analysts don't have a working web ready, these audience representatives may give feedback on a mock-up of the web, its purpose statement, or a diagram of its preliminary design.
The key to the analysis process is that it is meant to check the overall integrity of the web. Results from the analysis process are used in other processes to improve the web's performance. If analysis of the web's domain information shows that it is often out of date, for example, the planning process needs to be changed to decrease the time between updating the domain information. The analysis process on the web's elements helps all processes of web weaving work correctly and efficiently. The following sections go through each of the analysis checkpoints shown in the diagram.
A number of research organizations study and present information on Internet demographics. Their reports give a fascinating picture of the evolving nature of the online user.
Some Pew research has overturned stereotypes. For example, one project found that Internet users are far from antisocial nerds. In fact, they found that Internet users have more extensive social lives than non-users. Another project challenged the notion that everyone will eventually be online. Research found that 57% of people who did not already have Internet access did not plan to get access. In the spring of 2001, a Pew study confirmed again that the gender gap in Internet access has narrowed: of the 104 million American adults with Internet access, 50.6% are women.
The links to the statistics in each case identify the research organization(s) behind the numbers. Fascinating nuggets among these statistics abound. For example, an Arbitron/Edison Media Research study in 2001 found that one-third of Americans with Internet access at home would give up television if forced to choose between television and the Internet.
A September 2000 Nielsen/NetRatings study found that the top three cities in terms of the percentage of households accessing the Internet from home with a personal computer were: San Francisco (66%), Seattle (64%) and San Diego (62%). Milwaukee came in 30th at 46%, right between 29th place Chicago (46%) and 31st place Minneapolis (45%).
The free content in the eMarketer news sections, however, provides a great service for tracking the shape of online business and marketing. For example, in the news section, a link to an American Demographics (http://www.demographics.com) article characterizes the online population as shifting away from a majority of "geeky white guys" to a far more diverse population, with the highest growth being among the "Walmart crowd"-Americans over 55 years old with working-class incomes and middlebrow tastes.
Zook also tracks the number of Internet domains by state. A January 2001 count of Internet domain names ending in "dot com" showed California and New York on top with 1,843,900 and 721,725 domain names respectively. Wisconsin came in at number 25 with 105,600 domain names. In terms of the growth of total domain names registered from July 1998 to January 2001, Wisconsin came in 49th, ahead only of Kansas.
The Stanford study also found that 25% of respondents work at home on the Internet without a corresponding reduction their office work. Stanford also found shifts in media use with 60% of regular Internet users who have reduced their television viewing time to instead spend time online.
Without demographic statistics, the other way to see whether the audience is on the Web (or the Net) is to check for subject-oriented information resources and forums that are of interest to the audience. If the target audience consists of botanists, for example, what on-line information already exists that shows botanists as active on the Web and the Net? A web analyst can find out by:
Web analysts can interpret the results of the check of demographic statistics or Net resources related to the subject in two ways. First, if they find nothing, it might mean that the audience has made no forays into the Net-no newsgroups, no mailing lists, and no on-line collections of resources at major institutions. Based on this, web analysts could decide that the web would fill a great need for this audience. In contrast, they might conclude that this particular audience is not interested in on-line communication at all.
To decide which of these two alternatives is more accurate, web analysts should consult representative audience members. Analysts can check with people in the field and ask them, "What if you had an on-line system for information and communication?" Because on-line electronic mail discussion lists have been around longer than many network communications forums, an on-line mailing list that the target audience uses can be a good source of information about that audience's interests. Another aspect of this analysis of audience information is to make sure that the purpose for the web is one that meets the audience's patterns of communication, or at least the patterns in which the audience is willing to engage.
Web analysts might find that certain audiences are not willing to have a publicly available forum for discussion and information because of the nature of their subject matter, for example. Computer security systems administrators might not want to make detailed knowledge of their security techniques or discussions publicly available on a web server.
Certainly, private businesses or people involved in proprietary information might not want to support a web server to share everything they know. These same people might be interested in sharing information for other purposes, however. Computer security administrators might want to support a site that gives users advice about how to increase data security on computer systems. Thus, the web's purpose statement must match the audience's (or information provider's) preferred restrictions on the information. Current technology can support password protection or restricted access to Web information so that specific needs for access can be met.
Through a check of the audience, purpose, and communication patterns for that audience, web analysts quickly can detect logical problems that might make a web's success impossible. If the web's purpose is to teach new users about the Web, for example, web analysts might have a problem if the audience definition includes only new users. How can new users access the web in the first place? In this case, the audience should be redefined to include web trainers as well as the new users they are helping. This more accurate audience statement reflects the dual purpose of such a training web: getting the attention, approval, understanding, and cooperation of trainers as well as meeting the needs of the new users. If web analysts have an accurate audience statement, all the other processes in web weaving, such as design and development, can work more efficiently because they take the right audience into account.
Just as web analysts don't want to reach an audience that doesn't exist or target an audience for a purpose they don't want to achieve, they also don't want to duplicate what is being done successfully by another web. Checkpoint B is the "web literature search" part of the analysis: "Is some other web doing the same thing as what the web analyst wants to do? What webs out there are doing close to the same thing?" These questions should be asked at the start of web development as well as continuously during the web's use. New webs and information are developed all the time, and someone else might develop a web to accomplish the same purpose for the same audience.
To find out whether someone has built a web for a specific audience and purpose, use the subject and keyword-oriented searching methods. Web analysts also might try surfing for a web like this or for information related to the audience and purpose. During this process, save these links; if they are relevant to the audience and purpose, they can become part of the domain information on which the web's developers and users can draw.
The other benefit of this web literature search is that web analysts can find webs that might be accomplishing the same purpose for a different audience. These webs might give web analysts ideas about the kinds of information they can provide for the audience. Also, they might find webs that reach the same audience but for a different purpose. These webs can give useful background or related information that web analysts can include as links in the web. If they find a web that reaches the same audience for the same purpose, they can consider collaborating with the developers to further improve the information.
The check of the purpose-objective-specification triad is to make sure that something wasn't lost in the translation from the purpose (an overall statement of why the web exists) to the objective statement (a more specific statement of what the web will do) to the web specification (a detailed enumeration of the information on the web and constraints on its presentation).
During the development of the specifications, the analyst might find that a piece of information was added that has no relation to the stated purpose. Or some aspects of the stated purpose might not be reflected in the specification at all.
One way to do this check is to make a diagram that traces the links from the purpose statement to the objective statement to the specifications-both top-down and bottom-up. Each objective gives rise to specifications for the web. From the bottom up, every specification should be traced to an objective, and each objective should be traced to some aspect of the purpose. Every URL and component of the specification should be traced back to an objective, and each objective should be traced back to the purpose statement. If there is a mismatch, more planning must be done to restate the purpose, objectives, or specification so that they all match.
The quality of the domain information affects the users' perceptions of the web's overall quality. Inaccurate or incomplete information hinders web developers and leads to dissatisfaction by the web's users. The domain information must be checked to make sure that it is accurate, updated, and complete. Periodic checks can be made according to the nature of the domain.
There are two kinds of domain information: the information that the web developers need to understand enough to plan, analyze, design, implement, and develop the web; and the domain information that the web provides to its users. Remember also that domain information of the first type does not need to be located on the Net at all; it might include textbooks or courses the web developers use as a means of getting up to speed in the area of knowledge the web covers. This kind of domain information also can serve as reference information throughout the course of web weaving.
Verifying the accuracy, currency, and completeness of the domain information is a difficult task because the web analyst must have adequate knowledge of the subject matter to make a judgment about the veracity of all domain information. Although the verification of off-Net resources, such as books and courses, can be evaluated according to the same judgment the analyst uses for similar off-line materials, the Net information included in the first type of domain information and all the second type of domain information can be checked through a process of Net access and retrieval.
The process for checking Net-accessible domain information follows. For domain information provided to developers but not users of the web (the first type of domain information, which is Net-accessible), check the web page provided to developers in the same manner as described in the following paragraphs.
Verify the freshness of links. If the web is operational, use the links provided in the web itself to ensure that the links are not stale and that the resource has not moved. (The section "Implementation Analysis," discusses checking links in more detail.)
Check the accuracy of the information. If the web purports to respond with the correct solution to a problem given a set of inputs (for example, a physics problem answer through a forms interface), have a set of conditions that lead to a known result. Test the web to verify that it yields the same answer, and vary the test cases the web analyst uses.
Use reliable and authoritative sources. Use these sources, when available, to verify the new information added in the web since the last analysis. If necessary, contact the developer of that information and discuss his or her opinions of the information's accuracy.
In the case of databases, make sure that they are as current as they possibly can be. This is crucial, for example, if the web serves out time-dependent data, such as earthquake reports. If the web analyst is not getting a direct feed from an information provider who supplies the most current information, check to make sure that the most current reports or data have been downloaded to the database that the web analyst uses in the web.
Compare all specifications to items in the database. Are there any specifications calling for information that currently is missing?
Check locations on the Net. Use the methods of navigation described in Part III, "Web Implementation and Tools," to locate more current or reliable domain information.
Check locations on the Net to find other domain information that might be helpful as background to developers. Also look for information that could be part of the objective statement of the web.
Is the information at the right level of detail? Are the web weavers getting the right level of information for their work? Are the web's users given the right amount of information, or is there an information overkill or an oversimplicity in what is offered?
Is any of the information not appropriate for the users or the Web community at large? Is any of the information unethical, illegal, obscene, or otherwise inappropriate? Check links to outside information to verify that users will not encounter inappropriate material. Clearly, for outside sources of information, web analysts will be limited in the ability to control inappropriate information. Include this check in the analysis process to make decisions about what outside links the web analyst wants to use.
The goal of this checkpoint is to determine whether the web, based on server statistics or feedback from users, is being accessed consistently with how the web analyst wants it to be used. One part of checking this consistency is to find out whether the web server's access statistics show any unusual patterns. A web server administrator should be able to provide the web analyst with a listing of the web's files and how many times they have been accessed over a given period of time. Although this file-access count is a simple measure of web usage, using it might reveal some interesting access patterns. A check of the web's files, for example, might show the following access pattern over the past 30 days:
File Number of Accesses
This shows a fairly uneven distribution of accesses in which a single file is accessed many times (the 5800 shown for comic.html). Compared to the small number of accesses to a "front door" (top.html) of the web, this pattern shows a problem unless this imbalance was intended. Also, the statistics show that the newsletter isn't being read very much, whereas the resources are being accessed quite a bit. In order to interpret the web's access statistics, the analyst should ask the following questions:
Does the pattern of access indicate a balanced presentation, or are some pages getting disproportionate access? Does this indicate design problems?
If the web's "front door" page isn't getting very many accesses, this could indicate problems with the publicity about the web.
Another aspect of verifying the web's consistency of design and purpose is to see that it is listed and used in appropriate subject indexes related to the subject of the web. Does the web analyst find links to the web on home pages of people working in the field? Is the general reputation of the web good? A web analyst can find answers to these questions by doing web spider searches to find what pages on the web reference the pages. Check major subject trees to see whether the web is represented in the appropriate categories. Much of this analysis of the web's reputation is useful in the development and process.
It is very important that web analysts determine whether the audience's needs are being met by the web. To do this, they must compare the audience information (the audience's needs and interests) with the objective statement and the intended and actual benefits and results from the web. Information about the actual benefits and results of the web's use is the most difficult to come by. Web analysts can use several methods, however, to get a view of the effects of the web:
Ask users. Design and distribute a survey. This could be done using the forms feature of HTML if web analysts are willing to use features not found on all web browsers. They could distribute the survey by e-mail to a random sample of users (if such a sample can be constructed from a listing of registered users or derived from web-access logs). Include in this survey questions about user satisfaction. Are the users satisfied that the web meets their needs? What else would the users like to see on the web? How much do users feel they need each of the features the web offers?
Survey the field. Is the web used as a standard reference resource in the field of study? This is similar to the analysis performed at checkpoint E, but instead of just focusing on the occurrence of links in indexes and other web pages, web analysts need to analyze the web's reputation in the field of study or business as a whole. Do practitioners generally recommend the web as a good source of information?
Are the web analysts accomplishing the purpose? Are outcomes occurring that the web analysts specifically stated in the purpose? If one phrase of the purpose is to "foster research in the field," for example, is there any evidence to support this? Is there research published that was sparked by the interactions the web fostered? If the web analysts have a commercial web, how many sales can they say the web generated? Determine some measure of the purpose's success and apply it during the analysis process.
Another way to look at checkpoint F is to ask the broader question, "Is the web doing some good?" Even though the web might be under development and its objectives still have not truly been met, is there at least some redeeming value of the web? What benefits is it offering to the specific audience or even to the general public? A commercial site that also provides some valuable domain information, for example, is performing a public service by providing education about that topic.
Another approach is to conduct research using theory and methods from the fields, such as Computer-Mediated Communication, Computer-Supported Cooperative Work, Human-Computer Interaction, or other disciplines that can shed light on the dynamics of networked communication. These fields might yield theories the web analysts can use to form testable hypotheses about how the web is working to meet users' needs, to foster communication, or to effectively convey information.
The key to checkpoint F is to make sure that the other checkpoints-A through E-are working together to produce the desired results. A web analyst will notice that checkpoints A through E each touch on groups of the web's elements. Only checkpoint F spans the big-picture questions: Are the people who use the web (audience information) getting what they need (purpose, objective, benefits/results) from it?
Not only should the information in a web be analyzed for its rhetorical and technical integrity, but the overall design of a web also should be evaluated for how well it works as a user interface and for its intended purpose and audience.
One of the most important impressions a web gives to users is how much it costs them to retrieve the information in it. One aspect of user cost related to the technical composition of a web is retrieval time. Many inline images and extremely large pages can cause long retrieval times. Performance for users varies widely, based on the browsers they use, the type of Internet connections they have, and the amount of traffic on the network and the Web server.
Analysis can be done, however, in general terms, to get some ideas of retrieval times. Here is a possible (not necessarily definitive) checklist for web-performance analysis:
Does the web exhibit a coherent, balanced design that helps the user focus on its content? One design problem associated with a lack of aesthetic focus is the clown pants design method: The web consists of pages containing patches of information haphazardly organized. A related (poor) design technique is the K00L page design method; The web designer apparently attempts to use every HTML extension possible-including blinking text, centered text, multiple font sizes, and blaring, gaudy colors. An analyst should try to identify page designs that fall outside the purpose of the web or the audience's needs.
Do the web's pages exhibit repeated patterns and cues for consistency, with variation in these patterns for expressiveness? Repetition with expressive variation is a design principle used in many areas, such as graphic design, architecture, painting, textile design, and poetry. Which graphic elements are repeated on many pages for consistency? What content is varied to convey informational or expressive content?
How is color used? Color can be used effectively to code information or to focus user attention. Randomly used color can confuse the user, and some users have impaired perception of color. Complementary colors used on top of each other often give a jarring, shimmering effect.
Analysts can test a web for usability in a variety of ways. The quick ways of usability testing can give inexpensive, rough ideas of how well the web is working. More elaborate methods of usability testing can involve controlled experiments that might be prohibitively expensive. Here's a checklist to analyze the usability of a web, starting with the quick, simple, and inexpensive methods:
Perform a simple web walkthrough. With the web's purpose and audience definition in mind, analysts can perform a simple check of the pages, looking to see whether the major objectives are met.
Check sample user tasks. Based on the purpose statement and audience information for the web, analysts should be able to devise a set of tasks that the user is expected to accomplish. They then can use the web to accomplish these tasks, noting any problems along the way.
Test tasks on representative users. Based on the list defined in the preceding check, analysts can find several representative users and observe them as they complete the tasks. They might ask the users to say aloud what they are thinking when trying accomplish the tasks. They might record this narrative, gather recordings from several audience members, and then analyze the transcripts. This might help not only in web analysis, but also in redesign ideas.
Perform field testing with actual users. This method attempts to get a true sense of how the web actually is used. Analysts need to be able to select random users of the web and observe them in the settings in which they use the web. The users of a web might not be located in a single geographic area, so, obviously, this type of testing can be very difficult and expensive. Alternatively, extensive interviews of actual users or focus groups of users might give better insight into how the web is being used.
Semantics refers to the meaning conveyed by the pages of the web. Through many of the information-analysis steps outlined previously, the analyst would have addressed many aspects of how the web conveys meaning. But a separate check of the web, focusing only on semantics, might reveal problems not detected in other ways:
Besides analyzing a web's information and design, web analysts also should take a look at a web's implementation. The HTML that comprises a web should be correct, and, to the extent possible, the links that lead out of a web should not be stale or broken. Validating that a web conforms to current HTML specifications is key to making sure that a web is usable by many different browsers.
This analysis of implementation is not content analysis. These tools can help improve the quality of the HTML code, but not the meaning of what that code conveys. Analysts should be careful not to focus entirely on the technical validation of a web. This is analogous to focusing entirely on spelling and grammar as the single most important factor in quality writing. As a result of problems in internal or external links, web analysts should inform the web implementer.
Because you will use the URL of your web in a variety of contexts, you should check to see whether the directory structure and naming conventions used are simple, consistent, and extendible.
First, if you are analyzing a planned web, what will its URL be? In the early days of the Web, many companies' webs were "hosted" on the sites of Web presence providers. This led to situations in which URLs for a company (for example, evergreen) included a reference to their Web presence provider (for example, globalweb.com), leading to a URL such as http://www.globalweb.com/evergreen/. This URL doesn't clearly convey the ownership or brand of the web. Instead, if you are preparing a web for a company or major brand, consider getting a domain name.
Next, take a look at the planned structure of the directories on the web. Check to see whether the resulting path names make sense, are as simple as possible, and yet allow for growth in the directory tree. One common error is to place all files at a site at the highest level, leaving no room for organizing the files into a structure for easier maintainability and usability.
At the highest level, the URL identifying your server only, such as http://www.example.com/, would be the identifier you most commonly will use in advertising and promotion, particularly in non-Web media. This page therefore should load quickly and contain information to guide users efficiently to the information content of the site.
For other files at your web site, the directory structure and the file and directory names should identify the resource named by the URL. When I created a directory structure for my on-line periodical, CMC Magazine at http://www.december.com/cmc/mag/, I collected files about editorial policies into a single directory called editorial. This led to URLs to these files, such as the following:
These URLs are quite specialized, so I wouldn't expect to list them in a print advertisement. Therefore, their length is not as important as the meaning they convey. The benefit of the directory structure is that the URL can be read as a phrase. The URL http://www.december.com/cmc/mag/editorial/plan.html, for example, is for the CMC Magazine editorial plan.
Avoid redundancy in directory or file naming. For example, the URL to the home page of the following site doesn't need to be so complicated:
There's often little reason to create a directory for files of a special format (html), to use names like home, or to repeat the site name in a URL. A cleaner solution is http://www.example.com/index.html as the home page of the site. The file index.html is treated as the default page by most Web server software, so you even can leave off the index.html when providing publicity about your site.
Avoid mixed case in your directory names. A convention that provides directory names in initial uppercase and file names in all lowercase letters is a good one, but more often than not, it can lead to confusion. For example,
conveys a good structure for the documents of the STAR project, but its mix of upper- and lowercase might make it cumbersome to reference elsewhere. The mix of upper- and lowercase does convey meaning, but it is a redundant meaning when encoded into a URL; clearly, Projects is a directory because it has a subdirectory and index.html is a file because it is in the last position of the URL. The STAR project is clearly an acronym. The URL
enables the user to concentrate on the logical organization of the files on the server rather than the syntax of this organization.
Look for ways to make the directory structure of your site meaningful and stable, but as simple and extendible as possible.
The first step in implementation is to check to make sure that the HTML implementing the web is correct. See The HTML Toolbox.
The sections in this article so far in this have approached web analysis from a very formal set of checklist items intended to exhaustively analyze the integrity of any web. In looking at many web sites, I've also come up with an informal list for a web critique. These questions approach some of the most common problems I often see. In special cases, there might be a very good reason why a web designer or implementer has used a technique or effect mentioned here, so all these questions should be taken in the spirit that they might have a reasonable affirmative answer--but that answer had better be good.