The Internet and IP-based networks have gone a long way toward removing barriers between agencies. To further a spirit of cooperation, agencies are also beginning
to share information from disparate databases by sending some of that data to common data warehouses, where it can be merged, queried and analyzed.The Transportation Departments Bureau
of Transportation Statistics, for example,
has created the TranStats portal
(www.transtats.bts.gov).
It lets researchers, analysts or citizens
view, search, run reports and generate
graphics from 400 transportation-related
tables in over 100 databases (one of them
with more than 250 million rows) kept by
different departments and agencies.
The Bureau of Transportation Statistics
has three primary goals for
TranStats, said TranStats data team
manager Cheryl Young. The first is to assist
public users in obtaining data easily
and efficiently, the second is to support
public and internal users who perform
online analysis and the third is to offer
data to internal users to perform advanced
statistical analysis using third-party tools.
But setting up a data warehouse is more
involved than building a traditional database.
We learned that data collection and
presentation involves much more than
technology, Young said. It is difficult to
change the concept of data stewardship.
We would like to have focused much earlier
on this issue so that the technical
progress can follow.
Common characteristics
While they share many common characteristics,
a data warehouse is different in design
and function from a production database.
The primary difference is whether the
system will be used for online transaction
processing or online analytic processing.
Most production databases are primarily
for OTLP and require high-speed access to
individual data records. A motor vehicles
department, for example, would need an
OLTP database to let clerks quickly access
the record of an individual car, determine if
a registration is current and update data.
The TranStats data warehouse, on the
other hand, requires a database management
system designed for OLAP. This affects
the hardware needed to support the
data warehouse, as well as the DBMS itself.
TranStats runs on four Sun Solaris
servers and a 2.5TB Sybase IQ Database
Management System from Sybase Inc.
Young said query speed, not transaction
speed, led to selecting Sybase for this application.