GCN Home > 03/05/07 issue
Backup to the future
Innovations help close the window on potential data losses
By Edmund X. DeJesus, Special to GCN
The volume of data agencies accumulate each day can push backup processes to the wall, expanding nightly backup windows to the point where they could overlap the daytime hours of productive work.

But faster chips, bigger hard drives and speedier networks are spawning strategies for backup and restoration that can reduce your backup window to zero, minimize the size of backups and let you choose practically any point to restoreeven continuously.

Even agencies that arent 24/7 shops can benefit from strategies to reduce or eliminate this backup window, said Agnes Lamont, co-chair of the Data Protection Initiative for the Storage Networking Industry Association and vice president of marketing for backup software vendor TimeSpring Software Corp. of Newport Beach, Calif.

Besides the growing backup window, there is the question of whether once-daily backups are sufficient. After all, a lot goes on in a single day. If youre missing all the data, such as e-mail discussions people generated since last night, that gap represents a potentially significant loss.

Also, some administrators are under the impression that backup is solely for large operations that handle heavy transaction loads. But its just as important for small offices to pay close attention to backup, because they might not realize the value of their data or even have a backup strategy in place.

Backup and recovery are of interest to commercial enterprises, too, but government agencies have special needsmany legally mandatedfor information assurance, continuity and accessibility.

For example, preservation of material also could entail establishing a chain of custodymetadatafor that material. Can you prove where information came from and that its absolutely unchanged?

New backup strategies are becoming possible because of technology improvements in several areas. Disk space is increasing, networks are speedier and chips are faster. For many enterprises, the hardware cost of moving to these new backup strategies is negligible.

Logically, your recovery needs are what influence your backup strategy the most. Agencies must decide how long they can afford to wait to recover, and how much data they can afford to lose, advised Greg Schulz, founder and senior analyst of the StorageIO Group of Stillwater, Minn., and author of Resilient Storage Networks (Digital Press Books).

Some agencies might be able to wait a few hours to recover, while others might need to be up and running again in seconds. This wait is the recovery time objective: The shorter the RTO, the faster your recovery must be, so the faster your backup media must operate and the more expensive your solution will be.

Similarly, you also must decide how frequently to back up data. Some agencies might be able to deal with losing a few hours of data, while others cant miss even a few seconds. This length of time you can afford to lose is the recovery point objective: The shorter your RPO, the more complete and granular your backup must be and, again, the more expensive your solution.

Once you estimate your agencys RTO and RPO, you can begin searching for backup and recovery methods to meet your needs. You must decide whether recovery methods, and the backup media that support them, are fast enough, and whether your backup methods are complete and granular enough.

New backup methods

Snapshots are one popular new strategy. A snapshot consists of a collection of time-tagged pointers to data. Some of that data could be ordinary production data in use; some can be saved versions. If you want to recover file X as of noon on Thursday, you simply search through the pointers for the one that indicates the version needed. The result is a near-immediate recovery.

Since the snapshot itself consists mainly of pointers, it often takes much less space than a complete backup. It is also faster to create the snapshot than it is to physically copy data, and with faster chips and speedier networks, there is little performance overhead. This reduces the time gap between backups, lowering your exposure to missing data.

Snapshots also reduce your nightly backup window because so much saving is already being done during the day. Of course, gaps remainnamely, the time between snapshotsbut there are fewer than before.

The ultimate logical extension of reducing the time between backups is to make backups continuouslywhats known as continuous data protection. Real or true CDP saves everythingevery change to datawith no gaps in the backup data at all, and it has the ability to restore to any time necessary. We call this APIT: any point in time, said Eric Burgener, vice president of marketing for Mendocino Software of Fremont, Calif.

Near-CDP, without the real or true label, may have gaps of minutes or hours. For example, the latest version of Microsofts Data Protection Manager offers near-CDP at intervals of 15 minutes.

The advantage of CDP is that there is no backup window, Lamont said. Also, depending on implementation, there is little impact on production.

How does CDP work? When data changesa user saves a file, for instance, or an e-mail arrivesthat change is automatically made on the backup as well. Faster chips reduce the overhead to nearly nothing, while capacious hard drives provide the room.

There are different versions of CDP, each with different implications for your system. One example is making block level CDP tracks changes on a low level: what is physically changing on a hard drive, regardless of what file or application the data belongs to. Block level CDP is very fast, grabbing a chunk of data in one place and saving it in another, and is amenable to being implemented as a network appliance using special hardware.

Theres a lot of interest in block level CDP, because many enterprises use large databases that benefit from this method, said Chris Stakutis, chief technology officer of emerging storage software for IBM Corp.

File level CDP takes a different approach and works at a higher level, tracking changes to files regardless of what blocks of data those files reside on. This lets you keep your data consistent from the point of view of the application that uses the file.

For example, reconstructing a large Microsoft Word document can be time-consuming with block level CDP: Different parts of the document might reside on different blocks, possibly on different machines, each of which would have to be restored to the right version in time. With file level CDP, it would be a matter of tracking the versions of that one file.

Naturally, this depends on the kinds of data your agency handles, as well as the volume of data. If youre concerned with recovery time, you need to choose a CDP implementation that is application-aware, said Dan Tanner, founder of consultant ProgresSmart of Westboro, Mass.

CDP is most useful for valuable data that changes quickly, Burgener said. You must decide what events will trigger a CDP backup, including saving, deleting, opening, editing, or viewing a file or document.

Hardware or software versions of CDP are available. If opting for a software-based version, be sure it can use existing hardware and fit with your current operating systems and applications.

Some products, such as Mendocino Softwares InfiniView, let you switch between true CDP and near-CDP. After data is a few days old, you probably dont need every version, Burgener said. Instead, less frequent, periodic backup could be more appropriate.

New hardware appliances also can ease the backup burden. For example, the Active Archive Appliance from PowerFile offers an optical-disk media library as well as a RAID-enabled front-end cache.

This removes nonchanging data from the backup challenge, said Jonathan Buckley, vice president of marketing for PowerFile Inc. of Santa Clara, Calif. The smaller amount of current data is easier to manage, while the archived material is still accessible if necessary.

We find that 85 percent of the data people need is sitting in the cache, said Jim Sherhart, PowerFiles director of product management. Optical-disk access also is faster than conventional tape archive access, permitting new services such as PDF libraries.

Behind the backup

If keeping track of all the pieces that make up all the versions of all files sounds like a tough job, youre right. Thats why some backup solutions use a database to manage everything.

The Tivoli Storage Manager is built around a relational database, said Tricia Jiang, technical attaché for IBMs Tivoli Storage Solutions. The database tracks metadata about each item of data. This metadata also is useful for migrating data to long-term storage media, as well as for auditing and using space more efficiently.

Identifying the data to back up and actually doing the physical copying can be two different things. You dont necessarily perform the save immediately, Stakutis said. Instead, the backup system tracks the change, doing the save when its most expeditious.

A process called deduplication can slash the amount of data you need to back up by storing duplicate data only once. For example, if you receive an e-mail that you share with 10 colleagues, most backup systems keep every copy of that e-mail separately.

Deduplication scans data headed for backup and compares it to data already saved. If it finds a match, it only saves the data once and simply points to that one instance for every copy.

File-based deduplication is preferable to bit-based or block-based, said Diamond Lauffin, founder and CEO of the Lauffin Group of Los Angeles. This is because so many frequently used files are fixed-contentsuch as PDFs or graphics. File-based deduplication even deals with renamed files, since it works on the content of the file.

Data sprawl

Each increase in storage only increases the amount of backup necessary, Schulz noted. Files get bigger. New capabilities, such as images, sound or video, also lead to larger backups. E-mail, too, represents an avalanche of sometimes-vital information flowing into an agency. Most CDP is sold to deal with Microsoft Exchange, Burgener said.

The increasing mobility of the workforce also creates a backup concern. If a notebook PC is lost, how else can the data it contains be recovered? While the volume of data on laptops is comparatively smaller, it is usually more creative and may represent more value, says Stakutis.

SNIA is evolving standards and best practices to cope with rapidly changing backup technology. As backup windows fade into memory and backup gets to be continuous, the process will become transparent to users. This is the same philosophy as antivirus software, Stakutis proposes. Youll need it, wont even question it and wont notice it.

Edmund X. DeJesus is a freelance technical writer in Norwood, Mass. (dejesus@compuserve.com).

More news on related topics: Storage Management, Content / Record Management