I recently finished a banter on a newsgroup thread related to disaster recovery and ‘off-site tapes’ and it occurs to me that the great debate between disk and tape will never be over.
The reason for this is that the debate is as irrelevant as arguing whether everyone should drive pickup trucks or luxury sedans. Authors note: since I live in Texas, I also have a strong but slightly skewed opinion on this. Most people, except for a friend of mine who just likes to argue, would not get into a truck vs. sedan debate, because we understand the two types of vehicles serve different purposes. One is great for helping your buddy move but really does not seat a family of four well (even the king-cab version). The other is great for entertaining customers, but is hard to haul a trailer with.
Before you, the reader, lose interest in this article, ask yourself why it is we understand that “trucks vs. sedans” is a silly discussion (because they serve different purposes) and yet we find many IT professionals adamant that only one type of media (disk or tape), and therefore only one type of approach (replication or backup), should be used for protecting data. In my experience, if you are debating ‘tape vs. disk’, then you are arguing for the sake of arguing because the choices are not mutually exclusive.
To finish my automotive analogy, this is where we get luxury SUV’s as a hybrid approach that capitalizes on the merits of each vehicle and provides a solution that meets a wider range of needs while only taking up one space in the garage. ‘Tape and Disk’ (which for the rest of this article will be discussed as backup and replication respectively) also marry together well. A recent Gartner report predicts, with a probability greater than 70 percent, by 2003 large corporate enterprises will use a combination of replication technology (which feeds disk) and backup software (which feeds tape).
First, let’s agree on how each technology works.
Tape backup and tape as a media have been around longer than disk, so we’ll discuss them first (remember the twin reels on mainframes). My first experience with a computer was in 1977, where I saved my programs from a TRS-80 onto a standard audiocassette and thought I was in heaven when the 5.25″ floppy disks first came out. Twenty-five years later, tape works the same way. A dedicated application walks the file system using some kind of selection criteria and saves the whole-files in sequential order on to a long thin strip of cellophane coated in magnetic film. Tape drives are bigger and faster, and the software is fancier and schedulable, but the approach is still snapshots of what the whole files looked like at a certain point in time. Other than fancier agents for getting unusual kinds of data and capacity and management GUIs, tape-backup may look very similar in another 25 years. If your goal when using tape is anything in the disaster recovery or business continuity arena, then your only option is the same as 25 years ago, take the tapes to an off-site location.
Disk, as a media, has some fundamental differences that inherently allow it to be more flexible. Disk is linearly scalable, meaning that as one grows (from Megabytes to Gigabytes and into Terabytes) performance and access is predictable across the entire data set. Disk access is significantly more ingrained and robust within operating systems. Case in point, the first operating system on your desktop was probably DOS. Every other function, feature, or benefit is supplemental to the operating system’s primary purpose — accessing the disk. That said, replication is simply capturing the data as it is written to its disk and somehow propagating it elsewhere. For the benefit of this discussion, we will be discussing software-only replication.
Replication is not new, either, and is actually a combination of two data protection techniques. The idea of sending a copy of the data from disk to disk is as old as the nightly batch file that I would use to XCOPY files out to remote offices. The idea of capturing and tracking changes within files can be traced back to ‘file journaling’ on AS/400’s and beyond. The best of today’s replication technologies combine these approaches using file-system filter technology. As data is given to the file-system (by the O/S), the filter (just like an anti-virus filter) transparently accesses the data stream. Since most applications tend to write only the bytes that have changed, the filter catches only those granular bytes, and therefore, only the smallest possible set of data is transmitted (a.k.a. replicated) to another copy of the file. This small series of bytes is the change on the target(s), exactly as it had been changed on the source.
Unlike tape backup, the whole file is typically only sent once so a set of data can be ‘protected’ across a much slower connection. This correctly implies that while tape backup of distributed branch offices is not typically viable, replicating the data between those same branch offices is very achievable.
And before you begin to think that I’m trying to talk you into a truck or sedan, recognize that if you delete a file off a production server, a replication engine will very expediently delete the copy of that file off the target server in ‘real time’. That point alone brings us to why tape and disk should never be mutually exclusive.
If you need to see what your files looked like yesterday, or last week, or last month, or last year, use tape. Tape is an inexpensive method of archiving snapshots of whole files over long periods, with the anticipation of restoring periodically.
If you are concerned about ensuring a near zero loss of data or productivity, you need to look past back up and restoring. Data protection is replication.
Consider if you back up your data at midnight on Monday and then have an outage at 4 p.m. Tuesday afternoon.
In business continuity terms, the RTO (Recovery Time Objective) of “when will I be running” is the amount of time that the tape(s) need to restore (plus the amount of time to get the tape set back from your offsite provider). Best case RTO is a day (i.e. so you receive the tapes on Wednesday and perhaps it is restored by end of day, but more likely by Thursday morning). With replication technology, the second copy of data is already accessible (on another remote platform) so RTO is close to zero.
Still using business continuity metrics, the RPO (Recovery Point Objective) of “where will my data be when it is available” is the latency between when the tape backed up and when the outage occurred. So, at the end of your day of getting the tapes and doing the restore, your data looks like it did Monday night at midnight. Your users, who did not have access to any data on Wednesday, will discover on Thursday morning that the data is from Monday night with everything Tuesday being lost. With real-time replication technology, the alternate server has a copy of the data that is seconds old.
I once heard someone define Insanity as “repeating the same process over and over, but expecting different results”. Tape is inherently whole-file, scheduled, non-WAN-friendly and tends to require a backup window. If your only approach to disaster recovery is using tape and couriers (the repeated process), then do not expect to get different results (significant RPO and at least day-long RTO’s).
All that being said, replication may not be for everyone, but then again, neither is any other tool, brand or technology in our workspace.
So, stop debating and embrace each methodology for what they do best (tapes for inexpensive snapshots over long periods) and disk for up to the minute protection of the data). And if you back up the data at the replicated facility, that is an ‘off-site backup’ without paying for courier services. This leads me down a path of ROI and monthly operating costs compared with a capitalized system purchase (but that is a topic for another article – maybe next month).Another path we could talk about (in another article) is discussing disk-snapshot technologies. In those cases, the storage solution captures portions of the file-set that have changed over the past X hours. As that technology continues to develop, some clients will eventually consider baselines and disk-images to be a replacement for tape and an even better partner for replication (but again, that is for another article, so check back on this).
In closing, when you are ready to take a long look at really protecting your data, consider enhancing your backup solution with replicated data (which may offer the hybrid approach that you need for protecting the productivity of your users and ensuring long term data survival).
And when you next go car shopping, consider an SUV (which may offer the hybrid approach of truck and luxury sedan that meets your needs). But not the 4×4 versions. Your friends know that you aren’t going to take a Lexus or BMW SUV into the muddy hills, and will laugh at you.
About the Author
Jason Buffington has been working in the networking industry since 1989, with a majority of that time being focused on data protection. He is a Business Continuity Planner and an MCSE-MCT. He currently serves as the Director of Technical Marketing for NSI Software, enabling High Availability and Disaster Recovery solutions.
He can be reached at [email protected]