GCN Home > 07/07/08 issue
Another View | A strategy for massive archives
By Cray Henry, Special to GCN
Through the Defense Departments High Performance Computing Modernization Program (HPCMP), nearly 4,000 scientists and engineers have stored 6 petabytes of information, and they are expected to add 2 petabytes during the next year. A decade ago, data storage was a sideline activity in supercomputing today, it is an essential part of the business.

Each year, the program now generates one-third the amount of information it has produced in its entire 15-year history.

Meeting the next five years storage requirements will involve increasing the number of machines devoted to storage, improving mechanisms for predicting future storage needs, and possibly integrating algorithms into applications that allow users to catalog and define the storage period for new data.

Since its establishment in the mid-1990s, HPCMP has been able to procure and deploy major high-performance computing systems annually at its four shared-resources centers.

In most years, the newly acquired HPC systems have provided an additional computational capability equal to 70 percent of all HPC systems in the program at the time.

Scientists and engineers generate an ever-increasing amount of data as they solve more complex problems. The program places no restrictions on the amount of data users generate and store or how long it is to be retained. Such decisions are left to the programs that sponsor the work.

About a year ago, we formed a storage working group to survey storage management tools and hardware options.

During the next year, we will institute a number of strategies, such as a revised retention policy, reliance on the users to more proactively manage their data and an upgrade of storage systems, including new storage-density technologies.

To budget for storage, we need an annual storage target. Based on past growth and a management decision to try to slow that growth, we plan to add archive storage annually equal to 140 percent of the new data generated the previous year.

More news on related topics: Communications / Networks, Storage Management, Content / Record Management, Data Management, Defense IT, IT Management