Subscribe to the Free Print Edition!
Celebrating 25 Years

The database of databases

Evolution in warehousing means better information sharing

By Drew Robb, Special to GCN

The Internet and IP-based networks have gone a long way toward removing barriers between agencies. To further a spirit of cooperation, agencies are also beginning to share information from disparate databases by sending some of that data to common data warehouses, where it can be merged, queried and analyzed.

The Transportation Department’s Bureau of Transportation Statistics, for example, has created the TranStats portal (www.transtats.bts.gov).

It lets researchers, analysts or citizens view, search, run reports and generate graphics from 400 transportation-related tables in over 100 databases (one of them with more than 250 million rows) kept by different departments and agencies.

“The Bureau of Transportation Statistics has three primary goals for TranStats,” said TranStats data team manager Cheryl Young. “The first is to assist public users in obtaining data easily and efficiently, the second is to support public and internal users who perform online analysis and the third is to offer data to internal users to perform advanced statistical analysis using third-party tools.”

But setting up a data warehouse is more involved than building a traditional database. “We learned that data collection and presentation involves much more than technology,” Young said. “It is difficult to change the concept of data stewardship. We would like to have focused much earlier on this issue so that the technical progress can follow.”

Common characteristics
While they share many common characteristics, a data warehouse is different in design and function from a production database. The primary difference is whether the system will be used for online transaction processing or online analytic processing. Most production databases are primarily for OTLP and require high-speed access to individual data records. A motor vehicles department, for example, would need an OLTP database to let clerks quickly access the record of an individual car, determine if a registration is current and update data.

The TranStats data warehouse, on the other hand, requires a database management system designed for OLAP. This affects the hardware needed to support the data warehouse, as well as the DBMS itself. TranStats runs on four Sun Solaris servers and a 2.5TB Sybase IQ Database Management System from Sybase Inc. Young said query speed, not transaction speed, led to selecting Sybase for this application.



GCN Popup