Why Cloud Storage Use Could Be Limited in Enterprises

Enterprise Storage Forum content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

It seems like just about every day brings with it a new cloud storage product announcement from vendors big and small, but the reality is that beyond enterprise firewalls, cloud storage’s potential is limited.

There are two reasons for this: bandwidth limitations and the data integrity issues posed by the commodity drives that are typically used in cloud services. Together those two issues will limit what enterprise data storage users can do with external clouds.

Cloud Challenges

The ideal for cloud storage is to be self-managed, self-balanced and self-replicated, with regular data checksums to account for the undetectable or mis-corrected error rates of various storage technologies. Cloud storage depends on being able have multiples copies of files managed and checksummed and verified regularly, distributed across the storage cloud for safekeeping.

It’s a great idea, but it faces more than a few challenges, such as reliability, security, data integrity, power, and replication time and costs. But one of the biggest issues is simply that hardware is going to break. There are two ways disk and tape drives break:

Hitting the hard error rate of the media, which is expressed in average number of bits before an error occurs
Hitting the Annualized Failure Rate (AFR) of a device based on the number of hours used

The most common type of failure is known as the vendor’s bit error rate. The bit error rate is the expected failures per number of bits moved. The following is generally what is published by vendors:

Device	Hard Error Rate in Bits
Consumer SATA Drives	1 in 10E¹⁴
Enterprise SATA Drives	1 in 10E¹⁵
Enterprise FC/SAS Drives	1 in 10E¹⁶
LTO Tape	1 in 10E¹⁷
T10000B Tape	1 in 10E¹⁹

These seem like good values, but it is important to note that they haven’t improved much in the last 10 years, maybe by an order of magnitude, while densities have soared and performance has increased moderately. This will begin to cause problems as the gaps get worse in the future (see RAID’s Days May Be Numbered). So using vendors’ best-case number, how many errors will we see from moving data around, which is needed for replication in clouds?

Errors Per Data Moved	1PB	10PB	40PB	100PB
1TB Consumer SATA	9.007	90.07	360.288	900.720
1TB Enterprise SATA	0.901	9.007	36.029	90.072
600GB FC/SAS	0.090	0.901	3.603	9.007
LTO-4/TS1130	0.009	0.090	0.360	0.901
T10000B	0.000	0.001	0.004	0.009

Clearly, moving 100PB on even 1TB enterprise drives can potentially cause significant loss of data, especially as many clouds I am familiar with do not use RAID and maintain data protection via mirroring. Remember, this is a perfect world and does not include channel failures, memory corruptions and all the other types of hardware failures and silent corruption. What happens if the world is not perfect and failure rates are an order of magnitude worse?

Errors Per Data Moved	1PB	10PB	40PB	100PB
1TB Consumer SATA	90.072	900.720	3602.880	9007.199
1TB Enterprise SATA	9.007	90.072	360.288	900.720
600GB FC/SAS	0.901	9.007	36.029	90.072
LTO-4/TS1130	0.090	0.901	3.603	9.007
T10000B	0.001	0.009	0.036	0.090

With current technology, you could lose 900TB of data, which is not trivial and would take some time to replicate.

Bandwidth Limits Replication

Now let’s look at the time to replication with various Internet connection speeds and data volumes.

Network	Data Rate	Days to Replicate 1PB	Days to Replicate 10PB	Days to Replicate 40PB	Days to Replicate 100PB
OC-3	155 Mbits/sec	802	8018	32,071	80,178
OC-12	622 Mbits/sec	200	1998	7992	19,980
OC-48	2.5 Gbits/sec	51	506	2023	5057
OC-192	10 Gbits/sec	13	126	506	1264
OC-384	19.9 Gbits/sec	3	32	126	316
OC-768	39.8 Gbits/sec	1	8	32	79

Clearly, no one has an OC-768 connection, nor are they going to get one anytime soon, and very few have 100PB of data to replicate into a cloud, but the point is that data densities are growing faster than network speeds. There are already people talking about 100PB archives, but they don’t talk about OC-384 networks. It would take 10 months to replicate 100PB with OC-384 in the event of a disaster, and who can afford OC-384? That’s why, at least for the biggest enterprise storage environments, a centralized disaster recovery site that you can move operations to until everything is restored will be a requirement for the foreseeable future.

Consumer Problem Looming?

The bandwidth problem isn’t limited to enterprises. In the next 12 to 24 months, most of us will have 10Gbit/sec network connections at work (see Falling 10GbE Prices Spell Doom for Fibre Channel), while at home the fastest connect available as the current backbone of the Internet is OC-768, and each of us internally is going to have a connection that is 6.5 percent of OC-768. That will be limited, of course, by our DSL and cable connections, but their performance is going to grow and use up the backbone bandwidth. This is pretty shocking when you consider how much data we create and how long it takes to move it around. I have a home Internet backup service and about 1TB of data at home. It took me about three months to get all of the data copied off site via my cable connection, which was the bottleneck. If I had a crash before the off-site copy was created, I would have lost data.

Efforts like Internet2 may help ease the problem, but I worry that we are creating data faster than we can move it around. The issue becomes critical when data is lost — through human error, natural disaster or something more sinister — and must be re-replicated. You’d have all this data in a cloud, with two copies in two different places, and if one copy goes poof for whatever reason, you’d need to restore it from the other copy or copies, and as you can see that is going to take a very long time. During that time, you might only have one copy, and given hard error rates, you’re at risk. You could have more than two copies — and that’s probably a good idea with mission-critical data — but it starts getting very costly.

Google, Yahoo and other search engines for the most part use the cloud method for their data, but what about all the archival sites that already have 10, 20, 40 or more petabytes of storage data that is not used very often? There are already a lot of these sites, whether it is medical data, medical images or genetic data, or sites that have large images such as climate sites or the seismic data that is used for oil and gas exploration, and, of course, all the Sarbanes-Oxley data that is required to be kept in corporate America. Does it make sense to have all of this data online? Probably not, and the size of the data and the cost of power will likely be the overriding issues.

Henry Newman, CTO of Instrumental Inc. and a regular Enterprise Storage Forum contributor, is an industry consultant with 28 years experience in high-performance computing and storage.
See more articles by Henry Newman.

Follow Enterprise Storage Forum on Twitter

Advertisers

Menu

Our Brands

Cloud Challenges

Bandwidth Limits Replication

Consumer Problem Looming?

RELATED ARTICLESMORE FROM AUTHOR

15 Software Defined Storage Best Practices

What is Fibre Channel over Ethernet (FCoE)?

9 Types of Computer Memory Defined (With Use Cases)

Advertisers

Menu

Our Brands

RELATED ARTICLES MORE FROM AUTHOR