Tape vs. Disk - The Great Debate?
I recently finished a banter on a newsgroup thread related to disaster recovery and 'off-site tapes' and it occurs to me that the great debate between disk and tape will never be over.
The reason for this is that the debate is as irrelevant as arguing whether everyone should drive pickup trucks or luxury sedans. Authors note: since I live in Texas, I also have a strong but slightly skewed opinion on this. Most people, except for a friend of mine who just likes to argue, would not get into a truck vs. sedan debate, because we understand the two types of vehicles serve different purposes. One is great for helping your buddy move but really does not seat a family of four well (even the king-cab version). The other is great for entertaining customers, but is hard to haul a trailer with.
Before you, the reader, lose interest in this article, ask yourself why it is we understand that "trucks vs. sedans" is a silly discussion (because they serve different purposes) and yet we find many IT professionals adamant that only one type of media (disk or tape), and therefore only one type of approach (replication or backup), should be used for protecting data. In my experience, if you are debating 'tape vs. disk', then you are arguing for the sake of arguing because the choices are not mutually exclusive.
To finish my automotive analogy, this is where we get luxury SUV's as a hybrid approach that capitalizes on the merits of each vehicle and provides a solution that meets a wider range of needs while only taking up one space in the garage. 'Tape and Disk' (which for the rest of this article will be discussed as backup and replication respectively) also marry together well. A recent Gartner report predicts, with a probability greater than 70 percent, by 2003 large corporate enterprises will use a combination of replication technology (which feeds disk) and backup software (which feeds tape).
First, let's agree on how each technology works.
Tape backup and tape as a media have been around longer than disk, so we'll discuss them first (remember the twin reels on mainframes). My first experience with a computer was in 1977, where I saved my programs from a TRS-80 onto a standard audiocassette and thought I was in heaven when the 5.25" floppy disks first came out. Twenty-five years later, tape works the same way. A dedicated application walks the file system using some kind of selection criteria and saves the whole-files in sequential order on to a long thin strip of cellophane coated in magnetic film. Tape drives are bigger and faster, and the software is fancier and schedulable, but the approach is still snapshots of what the whole files looked like at a certain point in time. Other than fancier agents for getting unusual kinds of data and capacity and management GUIs, tape-backup may look very similar in another 25 years. If your goal when using tape is anything in the disaster recovery or business continuity arena, then your only option is the same as 25 years ago, take the tapes to an off-site location.
Disk, as a media, has some fundamental differences that inherently allow it to be more flexible. Disk is linearly scalable, meaning that as one grows (from Megabytes to Gigabytes and into Terabytes) performance and access is predictable across the entire data set. Disk access is significantly more ingrained and robust within operating systems. Case in point, the first operating system on your desktop was probably DOS. Every other function, feature, or benefit is supplemental to the operating system's primary purpose -- accessing the disk. That said, replication is simply capturing the data as it is written to its disk and somehow propagating it elsewhere. For the benefit of this discussion, we will be discussing software-only replication.
Replication is not new, either, and is actually a combination of two data protection techniques. The idea of sending a copy of the data from disk to disk is as old as the nightly batch file that I would use to XCOPY files out to remote offices. The idea of capturing and tracking changes within files can be traced back to 'file journaling' on AS/400's and beyond. The best of today's replication technologies combine these approaches using file-system filter technology. As data is given to the file-system (by the O/S), the filter (just like an anti-virus filter) transparently accesses the data stream. Since most applications tend to write only the bytes that have changed, the filter catches only those granular bytes, and therefore, only the smallest possible set of data is transmitted (a.k.a. replicated) to another copy of the file. This small series of bytes is the change on the target(s), exactly as it had been changed on the source.
Unlike tape backup, the whole file is typically only sent once so a set of data can be 'protected' across a much slower connection. This correctly implies that while tape backup of distributed branch offices is not typically viable, replicating the data between those same branch offices is very achievable.