GCN Home > 11/04/02 issue
Have a problem with information overload now? Just wait a while
By Martin Herman, Special to GCN
Many knowledge workers feel as if they are drowning in information.

Its not hard to understand why. The Web, which two years ago had an estimated 21T worth of static HTML pages, is doubling in size each year. We are inundated with e-mails, online documents, photographs, images and videos.

According to a study by the University of California at Berkeley, at www.sims.berkeley.edu/how-much-info, the world produces between 1 exabyte and 2 exabytes of information each yearabout 250M for every man, woman and child on earth. An exabyte is a billion gigabytes.

The study also said photographs are accumulating at an annual rate of 410 petabytes (410 million gigabytes), while video files add up to 300P annually.

Its impossible to browse so much information, but cant it be summarized better? Web search engines return too many irrelevant hits and put too much of a burden on the user, who typically has to know key words, browse returned documents and then enter new words to refine the search.

Cross-language document searchesfor example, using English key words to find relevant documents in Chineseare very difficult. Finding a certain piece of nontext material, such as a speech or video file, can turn into a major research project.

The Information Access Division of the National Institute of Standards and Technologys IT Laboratory is looking for better ways to access unstructured multimedia and other complex information: text, Web pages, images, video, voice, audio, and 2-D and 3-D graphics. NIST doesnt create such technologies, but we do contribute to them by developing performance metrics, evaluation methods, test suites and standards. We also work with industry to speed up commercial transition.

Our human language technology program focuses on the huge growth of non-English content, focused information as distinct from lists of documents, and human-computer interfaces.

Evaluation program

Since 1992, NIST has held annual text-retrieval conferences to evaluate systems and algorithms developed by participating organizations. They have three goals: to improve the state of the art in Web retrieval, to locate answers rather than merely documents and to bridge language barriers.

NIST is working with the Defense Advanced Research Projects Agency on several human language projects. One involves summarization, or reducing the number of words that must be read to understand the main points. Our goal is a reliable, comprehensive evaluation program for summarization tools.

More news on related topics: Web Strategies