Cloud Storage Will Be Limited By Drive Reliability, Bandwidth

Enterprise Storage Forum content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

We’ve all probably heard more than we want to hear about clouds this week, thanks to EMC World, but there are some things you need to think about if you’re considering adopting a cloud model as part of your storage networking architecture.

Clouds have a place in data storage architecture planning, as do applications that might use clouds, such as Hadoop. The standard cloud method of data replication is to use low-cost hardware. By replicating the data in the event of failure, the theory is that you have data reliability. As most of the work I do is in large storage environments, and given what I know about drive failure rates, I have some huge misgivings about using this method to manage petabytes of data that need to be highly reliable.

So what I want to do is take you through a step-by-step analysis of the low-cost hardware used in most clouds. I did not look at the failure rates of the blade, just the storage. As part of this analysis, I went to the Web sites of all the major disk manufacturers and used the best values across all vendors, so my analysis is likely best case and your mileage may vary. Let’s go through this thought process step-by-step.

Hard Errors Per Petabyte of Data Moved

The hard error rate, also known as BER (bit error rate), has a big effect on reliability. All the disk vendors I reviewed specified the BER in terms of non-recoverable read errors per bits read (1 sector per 10EXX bits).

Drive Type	One sector per X bits (Hard error rate)	Byte Equivalent	PByte Equivalent
Consumer SATA	10E14	1.25E+14	0.11
Enterprise SATA	10E15	1.25E+15	1.11
Enterprise SAS	10E16	1.25E+16	11.10

Enterprise SASdrives are not being used by anyone that I am aware of in a cloud architecture or Hadoop, given the huge cost difference between enterprise SAS and SATA drives. Most installations are using the cheapest hardware.

Time to Read a 2TB Drive

You will see why this is important later in the article; for now, just note the time required to read the data on a drive.

Drive Type	Time to read 2 TB drive (in seconds)
Consumer SATA	24390.2
Enterprise SATA	24390.2

Number of Drives to Saturate a Channel

It is important to understand the number of drives needed to saturate different speed SONET channels. I have estimated the performance of the channels by derating the channel for TCP/IP and other packetization and retry overhead, being very conservative at 90 percent of channel rate and operating at full duplex at these speeds in both directions.

OC Channel Speed	Estimated MB/sec	Number of Consumer SATA Drives in bandwidth	Number of Enterprise SATA Drives in bandwidth
48	276	3.37	3.04
192	1106	13.49	12.15
384	4424	53.95	48.61
768	17695	215.79	194.45

Clearly, it does not take a large number of drives to saturate the network bandwidth with failed disk drives.

Disk Drive Failure Per Year

There are two parts to drive failure formula. The first part is based on the hard error rate. If you move 111 TB of data, you can expect a disk that cannot read data that was written on consumer SATA drives. The number for enterprise SATA is 1.1 PB. The other component to failure is something called annualized failure rate (AFR). This is based on a yearly percentage of the total number of drives and is an estimate provided by the drive vendor. It should be noted that very few drive vendors provide AFR for consumer SATA drives. The next table shows the number of drives using 2 TB SATA for various storage amounts and the expected number of failures per year.

Number of Drives	AFR in %	1 PB in Drives	1 PB Failure Rate	5 PB in Drives	5 PB Failure Rate	10 PB in Drives	10 PB Failure rate	25 PB in Drives	25 PB Failure Rate
Consumer SATA	1.24%	500	6.2	2500	31	5000	62	12500	155
Enterprise SATA	0.73%	500	3.65	2500	18.25	5000	36.5	12500	91.25

The other aspect of this is failure based on the BER, and since this is based on data movement, I will again choose a conservative number for usage and estimate that the drive will use 5 percent of its total bandwidth year-round.

Drive Type	Number of Failures Per Year for 1 PB	5 PB	10 PB	25 PB
Consumer SATA	542	2712	5423	13558
Enterprise SATA	61	304	608	1519

To determine total failures, you need to add the BER to the AFR numbers using the 5 percent usage.

Drive Type	Number of Failures Per Year for 1 PB	5 PB	10 PB	25 PB
Consumer SATA	549	2743	5485	13713
Enterprise SATA	64	322	644	1611

If you take the 5 percent value and divide by 365 for total failures, you will get this number of failures per day:

Drive Type	Number of Daily Failures for 1 PB	5 PB	10 PB	25 PB
Consumer SATA	1.5	7.5	15	37.6
Enterprise SATA	0.2	0.9	1.8	4.4

A small increase to 7.5 percent usage of total bandwidth yields this number of failures per day for each of the storage volumes:

Drive Type	Number of Daily Failures for 1 PB	5 PB	10 PB	25 PB
Consumer SATA	2.2	11.2	22.5	56.1
Enterprise SATA	0.3	1.3	2.6	6.5

Total Amount of Data to be Moved for Failures

Now to the meat of the issue: For the 5 percent use case and 10 PB of storage, you will have an average of 15 consumer-grade SATA drives failing per day. Each of the drives takes approximately best case 24,390 seconds to be read and written over the network. At most, you can have the full bandwidth of 3.37 drives, and you have a total of 276 MB/sec of bandwidth for 24 hours. So using some simple math, that is 276 MB/sec*3600*24 equals total MB/sec per day. Doing the same math on the disk drives for each drive, you need 82 MB/sec for 24,390 seconds*15 drive failures. Here how that math works out for a few scenarios:

Consumer SATA Usage	OC Channel Speed	MB Seconds Per Day for 1 PB	5 PB	10 PB	25 PB
5%	48	20,882,319	8,860,106	-6,167,659	-51,250,956
7.5%	48	19,840,727	3,652,149	-16,583,574	-77,290,742
5%	192	92,545,935	80,523,722	65,495,957	20,412,660
7.5%	192	91,504,343	75,315,765	55,080,042	-5,627,126

Any negative number means that the drive replication requirement exceeds the channel bandwidth. So, for example, if you have 10 PB and OC-48 and 5 percent drive usage, that translates to 6,167,659 MB of bandwidth that exceeds the channel, or about 71 MB/sec over the 24 hour period. Obviously, this becomes a bigger and bigger problem over time, as you cannot replicate the data as fast as it’s lost. It is a a statistical probability that you are going to eventually lose data if you have 10 PB, and it will not take long. The only architectural option is a third copy of the data, which is very costly. The crossover point for an OC-48 channel with 5 percent usage of the storage system is between 5 PB and 10 PB, and with 7.5 percent usage you only have 42 MB/sec (3,652,149/(3600*24)) of spare bandwidth at 5 PB of storage space. What is needed is much faster networking, which comes at a cost, or more reliable storage, which also isn’t cheap.

I am sure cloud companies trade these costs off every day and figure out what the best method is for optimizing the costs. Is it possible that some of them don’t understand some of the basic hardware issues? I sure hope that is not the case. Clearly, cloud storage works just fine for less than 5 PB for an OC-48 channel and consumer SATA storage. How many clouds have more than that much storage today? I have no idea, but certainly some do, and 10 to 20 PB archives are common for large storage users.

Cloud architecture is far more complex than architecting for local storage. Cloud storage could be designed with a RAID back end, eliminating much of the problem, but most clouds I see do not use RAID because of the cost. The bottom line is that cloud architecture and design is not easy, and for large data volumes I cannot see how clouds can be cheaper than local storage.

Drive reliability and bandwidth will limit cloud adoption, and it’s a problem that may never get solved. Bandwidth will continue to get cheaper, but drive reliability hasn’t improved much, and data will likely continue to grow faster than bandwidth anyway. Perhaps network-based deduplication could help — assuming the data can be deduped. But for now at least, there doesn’t seem to be much of an alternative to good old-fashioned data centers for very large data stores.

Henry Newman, CTO of Instrumental Inc. and a regular Enterprise Storage Forum contributor, is an industry consultant with 28 years experience in high-performance computing and storage.
See more articles by Henry Newman.

Follow Enterprise Storage Forum on Twitter

Advertisers

Menu

Our Brands

Hard Errors Per Petabyte of Data Moved

Time to Read a 2TB Drive

Number of Drives to Saturate a Channel

Disk Drive Failure Per Year

Total Amount of Data to be Moved for Failures

RELATED ARTICLESMORE FROM AUTHOR

15 Software Defined Storage Best Practices

What is Fibre Channel over Ethernet (FCoE)?

9 Types of Computer Memory Defined (With Use Cases)

Advertisers

Menu

Our Brands

RELATED ARTICLES MORE FROM AUTHOR