Tape Migration: Ignore It at Your Peril
Tape used for computers has been around for about 60 years. That makes the technology older than most and far older than most of the people working in the industry today. Since the backup market for tape is now a far smaller percentage of the market than the archive market, tape migration is one of the biggest issues in the tape community, if not the biggest.
Data migration for large archives is often a continual process. Two key reasons for this are that:
- Tape densities increase every 18 to 24 months at about a 2x rate, but tape performance increases at about 20 percent. This means the time to migrate archives takes longer and longer without an increased number of tape drives.
- Tape drive interface, such as Fibre Channel 1 Gb and 2 Gb, are no longer supported. Without migration, more tapes libraries will be required a significant cost.
Given these reasons and others, migration to new tape technologies is an ongoing process and migration must include other hardware such as servers, RAID storage and switches.
Migration used to be pretty straightforward, but there are many reasons why this is no longer the case. I am aware of very few sites that have migrated backup data, but everyone I work with knows they must migrate archive data. Backup data for the most part is transient. Yes, Sarbanes-Oxley requires saving some data, but the amount most companies have that falls under its jurisdiction is minuscule. Moreover, tapes can be read for five years, so you can just put them in cold storage.
The size of this type of data compared to the size of archive data, such as medical records, the Hubble space telescope or things like NOAA satellite images, is tiny. Today, every car and airplane built must have all of its structure model data and design information archived in case of a problem. In the not too distance future, our genome will be stored. Large archives are growing faster than tape density and much faster than tape performance. Currently, density for LTO nearly doubles while performance increases 20 percent or so. ???? Per year? Per month? -- time period ??? Many sites take more than a year to migrate archives from Old Tape Drive X to New Tape Drive Y. Archive software now available makes that easier migration from archive software Vendor A to Vendor B, but for vendors that do not use proprietary tape formats, this is a smoother process. I am not even going to cover this, as it is so vendor dependent. Tape media migration is difficult enough, and it takes careful planning to ensure everything goes smoothly.
Here are factors to consider:
Depending on the archive software, as many as four types of hardware must be evaluated as part of migration planning from old tape drives to new:
- Tape drives, both old and new
- Fibre Channel switch ports for connectivity to the drives
- RAID storage
The number of old and new tape drives needed depends on two factors:
- How fast you want to complete the migration
- What the user load current is and how well you want and need to satisfy the load during the migration
Determining the number of needed tape drives is very difficult. How many available new tape drives do you need for user requests? How many old tape drives do you need for user requests, and how many new tape drives will be needed as more data is migrated? Does this vary by time of day? What is the impact to the center if a job waits to retrieve archived data? Clearly, it is very difficult to answer all of these questions, and it is possible that part of the problem could be modeled if you had enough of the correct data. However, other parts, such as the impact on the operation if a job waits, cannot be modeled. This is all complicated by how long it takes to read all of the data off of the old tapes and onto the new ones.
Simple math for LTO-4 800 GB at 120 MB/sec says it takes almost two hours of wall time to read the whole drive. The tape drive, therefore, could be in use for two hours, given that once you start reading data from a drive, the typical policy is to read all of the data on the cartridge. This, of course, is a policy question along with a potential policy issue for the HSM software. It is just one of things that must to be considered as part of migration.
Depending on the data usage model of archive data (e.g., is the data used as input to a computational job, such as a weather model?), how many time zones is the archive supporting, and what are the typical work hours of the users, it is generally difficult to judge what archive load will be as it is often variable. As a result, you must overprovision the number of tape drives, as drives will be in use often longer than is expected.
This often becomes a budget-balancing issue -- the cost of the old tape drives is usually pretty low, but the cost of the new drives is much higher. The longer you wait to migrate, however, the less expensive the cost of the tape drives and tapes, but more time it takes given the amount of data on the old drives. Hence, it is difficult to determine the optimal cost model.
2. Fibre Channel Switch Ports
Given that you will have more tape drives and potentially more storage, you will need more switch ports. You might have enough spare ports to add the new hardware, or you might need to upgrade your switch if the new storage or tape drives require next-generation Fibre Channel.
3. RAID Storage
Most archive systems must read the data from the old tapes to write to disk and then read from disk and write to the new tapes. If the archive software requires you to use disk as part of the migration processes, you will need additional storage space along with bandwidth to support the migration. The extra amount depends on how much spare bandwidth and storage space you have. Reading, for example, an LTO-4 tape end to end and writing out LTO-5 uses significant space and bandwidth. If you want to write a whole take, you must read in 1.5 TB of space, and sustained bandwidth of 140 MB/sec to write the tape and up to 240 MB/sec if the data is compressible. This can be a significant percentage of the bandwidth of your RAID controller, and it is about 30 percent of an 8 Gb Fibre Channel sustained bandwidth.
Archive systems that require the data to be read to disk (as opposed to tape-to-tape migration) are going to need more CPU power to read the data and validate checksums and write checksums. They will also need more memory bandwidth and PCIe bus bandwidth to move the data in and out of the system. Having enough of the right servers to match the storage and tape bandwidth increases is all part of this difficult architectural design problem.
Clearly, every archive software vendor has a set of tools to allow support migration of data to new tapes hardware. These tools generally have a number of tunable parameters to enable the administrator to control the migration speed and therefore the impact on the system. Often, they are tuned based on workload, but given the length of time it takes to read a tape -- and the amount of time is increasing dramatically, not decreasing -- the whole issue of tuning must be considered significantly in advance of the requirements.
Users typically have expectations on the availability of resources. Oftentimes, these are codified as SLAs. If you are migrating tapes, more resources are going to be used to, and you must determine what the impact will be to the user and the agreed-upon SLA. Often, this becomes a tradeoff between the time to do migration and the response time the user will see.
It All Comes Down to Money
Petabytes of data could be read and written out to new tapes in weeks if you want to spend enough money and time architecting and installing all of the new hardware. The reality is that it never happens that way. Migrations also never take 10 years to complete, and very rarely take even five years. It becomes a major challenge balancing user requirements, what hardware is needed, and the ever-present fiscal pressures when developing a migration plan.
On area often not considered that sometimes becomes costly is maintenance of older hardware. Another issue that must be considered is space and the cost of slots in the library. Buying another library because you have run out of space and could have saved the cost by migrating sooner to newer higher density tapes is part of the equation. A simple spreadsheet cannot solve this, given the number of variables and the complexity of things like user requirements, current hardware configuration, maintenance costs, the cost of running out of space and buying a new tape library. Tape migration is not easy, and it cannot be accomplished with precision, given that oftentimes things change that cannot be known in advance.
For the most part, I rely on what a former mentor referred to as EJ (engineering judgment). I generally try to complete the migration within one year of starting, 18 months at the latest. I try to ramp up the number of tape drives and storage, starting with just a few and buying more every few months following the usual price drop of both tapes and drives, trying to time the price drops based on historical data. Tapes are generally the highest-priced item in a large archive (e.g., 50,000 tapes at $85 average price is more than $4.2 million), so waiting as long as possible can provide a large cost savings.
Tape migration is not easy and it is costly, however, waiting until the tapes and hardware reach end of life is even more costly and it puts your data at risk. Do not wait till it is too late.
Henry Newman is CEO and CTO of Instrumental Inc. and has worked in HPC and large storage environments for 29 years. The outspoken Mr. Newman initially went to school to become a diplomat, but was firmly told during his first year that he might be better suited for a career that didn't require diplomatic skills. Diplomacy's loss was HPC's gain.