Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Migrating Data As Technology Changes
Now that you have your data on the remote, at some point you are going to need to move that data to new technology.
A few years ago, a customer asked us how to store data for 50 years. We investigated optical MO technology, optical glass platters, tape and all types of other media. What we determined seemed pretty obvious when we finally thought of it: It's not the media, stupid, it's the tape drive, hardware interface, HSM software, and even the application software.
A good example is seven-track tapes, which were supposed to last more than 30 years. The drives are more than 30 years old now and finding one is really hard. Add to that the hardware interface (I have no idea what interface they used), software drivers, and the application needed to read and process the data, and your data seven-track tapes may still be in fine shape, but they may be of little use to you.
Another example is MS Word 1.0. Remember that product from 1983? Do you think that MS Word XP or even MS Word 2000 can read and convert MS 1.0 files created in 1983? Not likely. The same is true today to a lesser extent. Adobe PDF files are supposedly good for 30 years, maybe longer (http://www.adobe.com/products/acrobat/pdfs/pdfarchiving.pdf). I have seen a number of companies that archived tiff files instead of PDF, and I expect JPEG2000 to replace tiff.
The extent of the problem is huge. Obsolescence for applications, files systems, HSM formats, computers, drivers, interfaces, tape drives, tapes, and data will require migration long before the life of the tape becomes an issue for enterprise-quality tapes.
One of the things we have done for customers is to develop a migration plan as part of the system architecture. This migration plan is based on obsolescence of every single piece part of the system, both hardware and software. Without something like this, management usually has budgetary problems when it comes time to upgrade the system, so a plan is a good thing to have to document the requirements of migration before the system is installed.
Every piece will need to be upgraded to ensure that critical data is available. This can be very expensive for large HSM systems, and often much of the data will never be used again, so determining what is important and what is not and removing the unimportant data is critical. Over a 30-year period, users come and go, but in our current UNIX world, the data is almost always stored by UID and GID. It is a lot to think about.
Getting the data to an off-site facility requires careful planning and good knowledge of your HSM software and the features and methods supported. Ensuring that the network has enough bandwidth is just part of the problem, given what John Mashey said about bandwidth and latency. You really need to know your applications and hardware to use the network efficiently.
Without a plan to migrate, your data is at risk. How important is the data? Is it the results of a drug study that FDA requires you to keep for 50 years or the plans to a new airplane? It could also be results of the weather forecast calculation from today, which could be recalculated with the original imput data and the code (maybe). What if that code was FORTAN90; could you compile that code in 2020? As you can see, a great deal of thought and planning needs to go into deciding what needs to be kept and what needs to be removed, and when that takes place. Not easy questions or answers.
Next time, we will cover a topic that is related to these last two articles — the quality of your data on tape. The issue will be tape wind quality, and issues that surround tape drive design.