Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Recently I read in disbelief an article by Chris Mellor in The Register about Amazon Glacier. I've seen a lot of outrageous sales and marketing claims over the years, so it takes a lot to rile me up, but Amazon's claims and the title of the article ("Drilling into Amazon's tape-killing Glacier cloud archive") sent me over the edge and into orbit.
Okay, before I address the Glacier claims and some other issues, let me note my background and general views:
• I like tape for archival storage - It is very cost-effective for long-term storage, given the low hard error rates, support for long shelf life, low cost per byte and low power usage.
• I have architected tape solutions for a few decades.
• I believe marketing claims should be backed up with facts, especially outrageous marketing claims
Let’s start with a few facts.
If you read the article about Amazon, they claim, “The annual average data item durability is 99.999999999 per cent – eleven nines.”
Besides being an interesting claim, is it completely unheard of by anyone in the industry. Does it include the possibility of risk events like tsunamis, solar storms, unhappy employees out to make a point and alike? What does this really mean?
Then of course you have the term durability. Is durability the data integrity of the file, or does durability mean the availability?
Does the claim mean that you get 11 nines if the file does not disappear because the storage fails? Is this a guarantee?
What does average durability mean? And this is not in the FAQs. I don’t know what it is, as it is not defined and I'm not sure if it will ever be defined.
On the other hand, I do know what 11 nines means in term of data loss.
If you claim 11 nines and you have 1 PiB of data, you are architecting to lose 11,259 bytes of data. So with 11 nines and 1 PiB of data and data loss of 11,259 bytes, is that 11,259 in 1 file or 1 byte in each of 11,259 files? Not an unreasonable question, I think.
Losing 11,259 bytes in, say, jpg headers might cost you 11,259 files. I go back to my questions, how is durability calculated? This is an outrageous marketing claim.
And how can Amazon do this for just .01 per GB, and is this GB (1000*1000*1000) or GiB (1024*1024*1024)? I am pretty sure it is GB, or maybe it is both. You get charged for storage in GB and transfer in GiB.
Who knows? But what I do know is we all know most file systems display file sizes in bytes, KiB or MiB, not KB or MB, so your costs will not be what you think they are.
Here is a simple example:
So for most home users with TiB of data, the cost goes up about 10 percent from what you likely think, but who knows for sure?
Next on my list: how can they do this? If you are a regular reader of this column, you have seen this table before. The hard error rate, also known as the unrecoverable read error rate, says that if you read this many bits from a drive, a sector will not be able to be read.
I got all of the error rate numbers from vendor web sites.
So what kinds of disk drives are used for Glacier? Should you care?
Absolutely. I assume there is a regular reading of all of the files on the disk to ensure that none of the file has been corrupted. This is very typical for archival environments, but the more you read the data and move bits, the higher the likelihood of having a bad sector in what you read.
In the RAID world this leads to a LUN rebuild, but what happens on Glacier? Is 1 sector worth a rebuild?
Well, a better question might be: is a bad sector indicative of the drive failing? The RAID vendors I have talked to believe it is and give the time to read all the data off of, say, 3 TB drives at more than 7 hours.
Do you want to wait? Can you wait?
Next on my list is: what is the cost of retrieval of your data from Glacier?
Let’s read between the lines. You use Amazon as either your second or third backup copy and now you need your data and maybe you need it fast. Time to get out your American Express Centurion Card (aka The Black Card) and max it out.
Let's say you want your data back and you make a mistake in how much you bring back over what time period. Or, you had a slow network and it got upgraded without you knowing it.
Anyway, this could get extremely expensive very fast if you really want a great deal of your files back quickly. Let's say I want to use this for home to backup my NAS box and I have 6 TB. Amazon says I can get .17 percent per day for free. Let’s say I want all my data back for free.
So in 10 days I get 1.7 percent of my data back, in 100 days I get 17 percent, in 589 days I get 100% back if I want to do it for free. Since the formula is a percentage, the amount of data really does not matter.
Maybe a bit longer than my wife and I would want, but maybe I am wrong and it is not an issue for most. One thing that is really important when bringing data back is to make sure that I can specify how much I want to bring back to meet the costs I want.
Does Amazon support this type of application? We know the answer.
What were they thinking?
In the 13 years I have been writing I have never gotten this upset at a marketing campaign. This irked me on so many levels:
• They start off the marketing with the usual claim that what we have developed will kill tape. How many times have I heard this over how many decades? Does anyone believe it? Sadly there are some that likely will.
• Where's the beef? There are many technology claims like the 9 count, but where is the data to back up the claims?
• Why the unclear language on durability? If there's one thing I hate, it's a fuzzy highly technical marketing campaign that avoids real facts.
Did Amazon really not think that people would run the financial models? Maybe they did think that some would run the numbers, but the market space they’re trying to target doesn’t know enough to run the numbers. Perhaps certain customers don’t ask the hard questions and make the right decision for their data.
Just give it to Amazon; they are a big company and will surely take care of the photo album of Auntie Em or the contact list for my business. I think that Amazon might be targeting the SMB and consumer online backup vendors like Carbonite, Code42, Mozy, Backblaze and the others.
I hope that these vendors do not succumb to the temptation to start making their own claims. We do not need a “my cloud storage is more reliable than your cloud storage" arms race. None of this is good for Cloud Storage, as we have had a number of major outages and issues from many of the large players in the Cloud vendor community.
Making claims that in my opinion, at least on the face of them, are totally absurd, unclear and misleading and with no technical backing does not help anyone. I hope that Amazon backs up their claims and cleans up their act, with clear pricing, or lets the Glacier melt.