Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
The archive community is constantly asking the same question -- how many copies are necessary? Let's turn the question around and ask how many copies can you afford. That is where the real debate begins. Most questions about the number of copies of a file are really asking a reliability question about the data. For example, I am often asked are two copies on low-cost, low-reliability media better than a single copy on enterprise media.
There are so many variables when what you're really trying to calculate is the reliability of your data. These qualifiers range from the obvious reliability of the media and natural disasters to the not so obvious, like a software bug or a deliberate attack on your data to wipe it out, or in some even worse cases, to change the data.
I am regularly asked how many copies should be kept, and yet people are unwilling or unable to answer the question in a realistic way about the level of reliability they want and what are they trying to protect against. Also, 100 percent reliability for very large amounts of data in every circumstance is virtually impossible, given everything from natural disasters to known issues with failure of devices and storage, human error and whatever else might come up. So the question goes back to how many copies do you need and what does that get you?
First, consider the basics:
Many of you have seen these charts before, but they bear repeating, given the topic:
Hard error rate in bits
Equivalent in bytes
LTO and some Enterprise SAS SSDs
Another way to look at the data is to look at how fast you hit the hard error rate based on the device running at 100 percent of the average rate (for disk average of inner and outer cylinders).
Hours to reach hard error rate at sustained data rate
Enterprise SAS/FC 3.5 inch
Enterprise SAS/FC 2.5 inch
some Enterprise SAS SSDs
Clearly, one copy of the data is not going to be acceptable if you want to guarantee that nothing is lost from looking at the hard error rate. Enterprise tapes are a possible exception to this. Of course, with one copy of anything, you are susceptible to a whole bunch of potential failures from things like a bad lot of disks, tapes or other media (and we all know and have heard of this happening). There are many other factors, such as devices that have been used in RAID groups. Most of these factors add cost, which is always a consideration when discussing archival data.
However, while interesting, this does not answer the question of how many copies do you need.