Many people have the perception that archive storage should be really, really cheap. After all, you don’t need much performance, so it shouldn’t cost much, should it?
However, these people tend to ignore important aspects of archival storage that can greatly impact the cost of the solutions.
In just about any storage pyramid diagram, archive storage is at the bottom of the pyramid. Archive storage is all about massive capacities and very little performance. It’s a place you put your data when you are done with it for the immediate future but still may need access to it.
Archive storage is simple in concept, but, as always, the devil is in the details. And don’t forget that Murphy lives there too.
Archive Fundamentals
When thinking about an archive solution, you need to start with some fundamental questions, such as:
- What kind of performance do I want (pushing data into the archive and pulling data out of the archive)?
- How much data will need to be archived?
- What kind of reliability do I want (or how much risk can I tolerate)? In other words, how important is the data?
Answering these questions can tell you a great deal about the design of the archive solution. But many people fail to consider that the answers to these questions have an economic impact on archive storage.
For example, if I want an extremely low risk of losing data, how much will that cost me? How much can I lower my costs if I can accept more data loss risk? How much does data recall performance affect my costs? If I don’t need the data right away, can the reduced performance reduce my costs?
These are all trade-offs that we make in designing archive solutions. It’s only natural that people want to be fully informed about the impact of their decisions. But do they really understand the economic realities of their decisions?
You Get What You Pay For
My wife started me watching a television show called “Love It or List It” where families redo their current home while also looking for possible new homes. At the end of the show, they get to decide which direction way they want to go—stay with their current, remodeled home, or sell it and buy a new one.
What is really interesting to me is that the process makes the families develop a list of their home priorities. For example, if they move slightly out of town, can they get a bigger home? Or if they drop the silly idea of blowing out a kitchen wall because it will require major structural modifications, can they maybe get some other things on their wish list for their current home?
The point of my cathartic admission that I watch HGTV is that these people are forced or coerced into really thinking about their priorities. That allows them to make trade-offs in a sane manner (but they still get blinded-sided because making sane, logical decisions does not make good television). What startles them is the economic realities of their decisions and priorities. Some of them actually thought they could get everything on their wish list: top-of-the-line quality for about the price of an evening out.
Lately, I seem to be encountering situations where people want amazingly inexpensive archive storage that is so cheap it is absurd. But at the same time, they are unwilling to accept that this might increase the risk of data loss. They insist that it is possible to have very inexpensive archive solutions in their data center with virtually no risk of data loss.
Many people assume that archive storage is cheap, easy to build, and their data will always be there and will never become corrupt. They have forgotten that just like everything else, there are trade-offs in archive storage.
In general, if you want lots of data, and the ability to recall it quickly, while having the lowest possible risk of data loss, then you will have to pay for it (i.e. it is expensive).
Economic Realities
To help explain the trade-offs, let’s begin by examining various storage solutions and how often they might encounter an error.
One of the most prevalent risks is hitting what is called the “hard error rate.” This is the number of bits that can be read before a sector will not be able to be read. This hard error rate varies a great deal across the storage options.
Henry Newman has written a wonderful article that presents a table that lists the hard error rates for various media and how this translates into petabytes (PB). That is, how many petabytes can be read from the storage media before hitting a read error. Henry’s first table is reproduced below:
Device | Hard Error rate in bits | Equivalent in bytes | PB equivalent |
---|---|---|---|
SATA Consumer | 10E14 | 1.25E+13 | 0.01 |
SATA Enterprise | 10E15 | 1.25E+14 | 0.11 |
Enterprise SAS/FC | 10E16 | 1.25E+15 | 1.11 |
LTO and Enterprise SAS SSDs | 10E17 | 1.25E+16 | 11.10 |
Enterprise Tape | 10E19 | 1.25E+18 | 1,110.22 |
An alternative way to think about this data is to consider how long it would take me to encounter an unreadable sector if I read the data at the full rate of the device.
Henry’s second table showed the amount of time it takes (in hours) to encounter a read error when you use more than one or more devices and start reading from those devices at maximum speed (he varied the number of devices from 1 to 200). Table 2 below reproduces his data and adds two more columns for 250 devices and 300 devices.
Device Type | 1 | 10 | 50 | 100 | 200 | 250 | 300 |
---|---|---|---|---|---|---|---|
Device | Devices | Devices | Devices | Devices | Devices | Devices | |
Hours to reach hard error rate at sustained data read rate | |||||||
Consumer SATA | 50.9 | 5.1 | 1.0 | 0.5 | 0.3 | 0.2 | 0.17 |
Enterprise SATA | 301.0 | 30.1 | 6.0 | 3.0 | 1.5 | 1.2 | 1.0 |
Enterprise SAS/FC 3.5 inch | 2,759.5 | 275.9 | 55.2 | 27.6 | 13.8 | 11.0 | 9.2 |
Enterprise SAS/FC 2.5 inch | 1,965.2 | 196.5 | 39.3 | 19.7 | 9.8 | 19.7 | 9.8 |
LTO-5 | 23,652.6 | 2,365.3 | 473.1 | 236.5 | 118.3 | 94.6 | 78.8 |
Some Enterprise SAS SSDs | 7,884.2 | 788.4 | 157.7 | 78.8 | 39.4 | 31.5 | 26.3 |
Enterprise Tape | 1,379,737.1 | 137,973.7 | 27,594.7 | 13,797.4 | 6,898.7 | 5,518.9 | 4.599.1 |
These two tables illustrate that if you use SATA or SAS disks, the upper bound on capacity before you encounter an unreadable sector is around 1 PB and the lower bound is about 110 TB. And even with this data, it is very difficult to give any kind of estimate of when you could encounter the hard error because it becomes so configuration dependent.
Encountering a hard error usually means that the RAID controller flags the drive as bad and then starts a rebuild. To get access to the data you tried to read, you need to wait for the rebuild to finish. The amount of time to complete a rebuild can be fairly long, but again depends on the configuration.
During this time, all of the remaining drives have to be read, increasing the probability of encountering another hard read error and possibly losing the RAID group. Consequently, you really need to have more than one copy of the data in the archive.
The number of copies also remains an open question as Henry’s article points out. But at a minimum, you need two copies and may even need three copies (I haven’t heard of more than three copies yet unless it is a globally distributed archive). At a minimum this means you need two to three times the capacity of the archive in storage hardware alone. If you want a 1 PB archive, you will need 2-3 PB in storage capacity.
What happened to the really cheap archive storage we were expecting? Can’t we get archival storage for $0.25/GB?
The answer is that you can get archive storage close to that price, but you run risks in doing so for very large archives. To drastically reduce the risks, you need to make two or three copies. The $0.25/GB archive suddenly becomes a $0.50/GB or $0.75/GB archive. But these are just the hardware costs.
You need a file system of some type to store the data. At the low end of the spectrum, you could just create 2-3 different pools of storage using freely available file systems. Then you could put the archive data on one copy and use rsync to make sure that the data is copied to the other pool(s).
But there is a problem when this approach. What happens if you encounter a hard read error (bad sector)? This triggers a RAID rebuild and the associated issues. You can use the other copies to restore the data, but that has to be programmed. And you have to be sure the remaining copies are correct (i.e. no bit-rot). In other words, you will have to do all of the programming and maintenance yourself. But all of this is free, right?
At the other end of the spectrum there are file systems, many of which are proprietary or have commercial support, that handle all of this work for you. In these file systems, data is copied to ensure that there are two to three copies distributed across the storage pools. In the event of a hard read error, the file system can read one of the other copies of the data while doing the rebuild in the background. Once the rebuild is done, it will then check the rebuilt data to the other copies. But again, the system is reading data so we’re increasing our chances of a hard read error again.
All of this takes a great deal of work that happens in the background while you aren’t watching. Having one of these file system can greatly reduce your work load.
The Final Word
Archives are not as simple as they appear. You have to ask yourself about the purpose of the archive and the projected amount of data that will need to be stored. Most importantly, though, you need to ask yourself about the importance of the data in the archive. Answering this simple question can have a very large impact on the economic realities of the archive.
I apologize for bursting any bubbles, but you cannot have large archives on spinning disk with only one copy of the data and expect that data to always be there. The hard error rates in the previous tables illustrate this quite clearly.
If you really don’t care if some data becomes unreadable, then you purchase enough hardware for one copy of the data, perhaps getting the low price you expected.
But if your data is very important and you are worried about possible unrecoverable reads, possibly more than one, you will need to have multiple copies of the data. This also means you will need more hardware than you thought. For example, if you want a 1 PB archive you will need 2 PB, 3 PB or more, of capacity. If 1 PB of storage hardware costs $0.25/GB then to store three copies of the data will cost $0.75/GB.
This simple example of the impact of hard read error rates and the number of data copies it implies, is an economic reality that many people don’t want to face. The expectation is that archive data has small performance requirements so it should be inexpensive. The reality is that if you want large archives on spinning disk and you want to be able to read them in a timely manner and reduce the risk of losing data, then you will likely need one or more copies of the data. This costs more than you expect—but that’s the reality of archive storage.