Parallel Storage Clouds


Want the latest storage insights?

Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure

Share it on Twitter  
Share it on Facebook  
Share it on Google+
Share it on Linked in  

The world is enamored with cloud computing, particularly cloud storage. Cloud storage is used for a variety of data tasks: performing IO for applications that are running in the cloud (S3 is an example), using Hadoop/MapReduce or some other analytical computation, storing data for later use but not archiving it, or storing data for archival purposes.

For all four of these use cases, the amount of data is growing very quickly — much faster than you might think. At the same time, the computational resources for processing the data are increasing.

For example, on Amazon EC2you can get the following:

  • "High-Memory Quadruple Extra Large Instance" (m2.4xlarge)
    • 68.4 GiB of memory
    • 26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units Each)
    • 1,690 GB of storage
    • GigE
  • "Cluster Compute Eight Extra Instance" (cc2.8xlarge)
    • 60.5 GiB memory
    • 88 EC2 Compute Units (2x Intel E5-2670, 8-core each)
    • 3,370 GB storage
    • 10GigE
  • "High Memory Cluster Eight Extra Instance" (cr1.8xlarge)
    • 244 GiB memory
    • 88 EC2 Compute Units (2x Intel E5-2670, 8-core each)
    • 240GB SSD instance storage
    • 10GigE
  • "Cluster GPU Quadruple Extra Large Instance" (cg1.4xlarge)
    • 22 GiB memory
    • 33.5 EC2 Compute Units (2x Intel X5560, 4-core each)
    • 2x NVIDIA M2050 Tesla GPU
    • 1,690 GB storage
    • 10GigE
  • "High I/O Quadruple Extra Large Instance" (hi.4xlarge)
    • 60.5 GiB memory
    • 35 EC2 Compute Units (16 virtual cores)
    • 2 SSD based volumes with 1,024 GB each
    • 10GigE
    • Amazon states that you can achieve 120,000 random read IOPS, and 10,000-85,000 random write IOPS
  • "High Storage Instance" (hs1.8xlarge)
    • 117 GiB memory
    • 35 EC2 Compute Units (16 virtual cores)
    • 24 hard drives with 2TB of instance storage
    • 10GigE
    • Amazon states that each instance can deliver 2.4 GiB/s of sequential read and 2.6 GiB/s of sequential write performance.

Notice that in some instances you are getting a great deal of compute power. However, you can also get quite a bit of storage, up to a little over 3.3TB. You can also opt for some instances where you can get around 2.5 GiB/s sequential performance or 120,000 random read IOPS or up to 85,000 random write IOPS.

All of these numbers are respectable, but they are for a single node. Currently, each instance has to have its own copy of the data, unless you create a storage solution using the instances (not always easy to achieve).

What happens if each instance needs to access more than 3.37TB? What if you need more storage than the SSD instances allow? How do you share data so that you don't have to copy it to each instance? What if you need more performance than is offered by these instances?

An equally important question, and one that usually goes unnoticed, is what if you need more single-node performance? If you look at the list of current Amazon instances, you will see that the fastest single node (single client) performance is about 2.4 GB/s. Sharing the file between servers only exacerbates the single-node IO throughput problem. The best way out of this problem, that I can see, is to start thinking about cloud storage in a parallel fashion.

Cloud Data Explosion — Video Example

I'm not sure if you've ever watched some of the television programs that have "World's Worst Driver" in the title, but they are very entertaining. After I stop laughing at the person backing up the highway because they missed an exit, I realize that the video is from surveillance cameras. This realization is also enforced by the morning traffic report where the commentator can flash to a camera that clearly shows an accident blocking traffic virtually anywhere in the city in which I live.

Another favorite of mine is that several days after a tornado, new footage pops up on the news showing the destruction. Sometimes this video comes from a parking lots and sometimes from a gas station. Regardless, there usually ends up being a remarkable amount of footage of a tornado taken from all kinds of different cameras.

Submit a Comment


People are discussing this article with 0 comment(s)