Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Disaster recovery (DR) is often discussed in broad terms throughout the storage industry, but in this article I will explore a specific segment of the overall market: DR planning for large archives.
This is the first article in a two-part series on archives. The next article will cover architectural planning for large archives.
First, what are my definitions of an archive, and what is a large archive? An archive is a repository of information that is saved, but most of the information is infrequently accessed.
The definitions of archives have changed recently. Just three or four years ago, archives were always on tape, with only a small disk cache (usually less than 5% of the total capacity). The software to manage data on tape and/or disk is called hierarchical storage management (HSM) and was developed for mainframes more than 35 years ago.
Today we have large disk-based archives that back up data over networks. For example, both my work PC and home PCs are backed up via the internet, and large cloud-based archives are common today. There is of course a question of reliability (see “Cloud Storage Will Be Limited By Drive Reliability, Bandwidth”), but that is a different topic.
My definition of a large archive is fairly simple: anything over 2,000 SATA disk drives. Today, that is about 4 PB, and next year it will likely be 8PB when drive capacities increase. I am using 2,000 drives for the archive size given the expected failure rate of the 2,000 drives. Even in a RAID-6 configuration which would require 2,400 drives it will be challenging given the rebuild time to manage that many drives for a single application.
Three Types of Disaster
There are three types of disasters to be considered: failure of a single file or group of files, metadata corruption, and what I often call the “sprinkler error.”
The failure of a single file or group of files is a completely different problem than a sprinkler going off in a computer room and destroying all of the equipment. The failure of a file or groups of files is significantly more likely and far more common than a complete disaster (earthquake, hurricane, lighting strike, power surge, sprinklers going off, etc.) but when I architect systems I ensure that there are always at least two copies of the data. In large archives, given the time to re-replicate the data and the data integrity from the storage system in the event of a disaster, two copies might not be enough.
The metadata corruption problem is also unlikely, but it does happen and it happens more often than many believe. Metadata corruption could be the corruption of the file system metadata or, if data deduplication is used, the corruption of one of the blocks which, if it is not well protected, will be a disaster.
Of course, cost plays a big part in how much data protection a site will have. Many vendors talk about four 9s, five 9s, or even eight 9s of availability and reliability. However, when you have many petabytes of data this concept needs to be re-thought.
The chart below shows expected data loss based on the number of 9s of reliability. Data Loss in Bytes
Data Loss in Bytes
Data Reliability %
So for ten 9s of data reliability and just a single petabyte of data, a loss of 900,720 bytes can be expected. Therefore, data reliability in terms of a count of 9s must be reconsidered in the context of large archives. In some data preservation environments, data loss is just not acceptable for any reason. I often find in these types of environments, when an organization is moving from analog to digital, that some managers do not understand that data is not 100% reliable on digital media and that having multiple copies of digital media costs more money than keeping books on a shelf, given that data must be migrated to new media and it is still not 100% reliable without many many copies of data.
Recommendations for Disk- and Tape-based Archives
I recommend the following data protection policies and procedures for large archives. Except where noted, these recommendations apply to both disk-based archives and tape-based archives.
Data should be synchronously replicated, and validated, to another location that is outside the potential disaster area. For example, if you are in an area that has tornados the replication should be at least 100 miles -- or, better yet, 500 miles -- north or south of the base location as most tornadoes travel east/west.
Have additional ECC or checksums available to validate the data. Most HSM systems have per-file checksums available on tape, but most do not have them on disk. Technologies such as T10 DIF/PI for tape and disk will become available this year, and many vendors are working on end-to-end data integrity techniques. Per-file checksums are starting to become part of a common discussion in the file system community, but a checksum does not correct the data; it only tells you the file has gone bad. If you want to know where it went bad within the file you need to have ECC within the file to detect, and hopefully correct, the failure. (For more details on my opinions regarding per-file ECC, see “Error Correction: An Urgent Need for Files”).
In the case of disk-based archives, all RAID devices should have “parity check on read” enabled. Some RAID controllers support this, but others do not. And some RAID arrays support this feature but it causes significant performance degradation. This feature provides another level of integrity over just having per-file checksums if the failure of the checksum is caused by some failure issue within the storage system. Parity check on read ensures that the failure of a block of data is found on the RAID controller before it is the failure of an entire file.
In the case of tape-based archives, it's important to note that data does not move directly to tape, but to disk and then to tape via HSM. Again, RAID devices should have parity check on read enabled.
Ensure that error monitoring for both soft and hard errors is done on all aspects of the hardware. Soft errors eventually turn into hard errors and, potentially, data failure. Soft errors should be quickly addressed before they become hard errors. This is a significant problem for tape as there is no standard for Self-Monitoring, Analysis and Reporting Technology (SMART). For more information, see "Solving the Storage Error Management Dilemma."
If possible, regularly protect and back up the metadata for the file system, and HSM metadata for data on tape, because metadata can be restored without restoring all of the data in case of a failure. This works far better, and is far easier, if metadata and data are separated in the file system.
Validate per-file checksums regularly. For large archives this becomes a significant architectural issue given the CPU, memory and I/O bandwidth required.
DR planning for disk- and tape-based archives is similar. Some of the technologies are different, but the key is regular validation and preparation for the disaster that might come. Far too many organizations do not properly fund large archives and yet expect no data loss. If you have a 50PB archive and a single replicated site and you lose the archive due to a disaster, you will almost certainly lose data when you re-replicate the site. There is no way to get around the hard error rates in the media.
In my next article I will address architectural planning for large archives.
Henry Newman, CEO and CTO of Instrumental, Inc., and a regular Enterprise Storage Forum contributor, is an industry consultant with 29 years experience in high-performance computing and storage.
Follow Enterprise Storage Forum on Twitter.