Subscribe to the Free Print Edition!
Celebrating 25 Years

Have a problem with information overload now? Just wait a while

By Martin Herman, Special to GCN

Many knowledge workers feel as if they are drowning in information.

It’s not hard to understand why. The Web, which two years ago had an estimated 21T worth of static HTML pages, is doubling in size each year. We are inundated with e-mails, online documents, photographs, images and videos.

According to a study by the University of California at Berkeley, at www.sims.berkeley.edu/how-much-info, the world “produces between 1 exabyte and 2 exabytes of information each year—about 250M for every man, woman and child on earth.” An exabyte is a billion gigabytes.

The study also said photographs are accumulating at an annual rate of 410 petabytes (410 million gigabytes), while video files add up to 300P annually.

It’s impossible to browse so much information, but can’t it be summarized better? Web search engines return too many irrelevant hits and put too much of a burden on the user, who typically has to know key words, browse returned documents and then enter new words to refine the search.

Cross-language document searches—for example, using English key words to find relevant documents in Chinese—are very difficult. Finding a certain piece of nontext material, such as a speech or video file, can turn into a major research project.

The Information Access Division of the National Institute of Standards and Technology’s IT Laboratory is looking for better ways to access unstructured multimedia and other complex information: text, Web pages, images, video, voice, audio, and 2-D and 3-D graphics. NIST doesn’t create such technologies, but we do contribute to them by developing performance metrics, evaluation methods, test suites and standards. We also work with industry to speed up commercial transition.

Our human language technology program focuses on the huge growth of non-English content, focused information as distinct from lists of documents, and human-computer interfaces.

Evaluation program

Since 1992, NIST has held annual text-retrieval conferences to evaluate systems and algorithms developed by participating organizations. They have three goals: to improve the state of the art in Web retrieval, to locate answers rather than merely documents and to bridge language barriers.

NIST is working with the Defense Advanced Research Projects Agency on several human language projects. One involves summarization, or reducing the number of words that must be read to understand the main points. Our goal is a reliable, comprehensive evaluation program for summarization tools.