Shared Storage in the Cloud
Technical computing and HPC workloads are moving to the cloud at an ever-increasing pace. The ability to spin up a compute instance quickly and pay only for what you use is a very attractive option. Moving forward, shared storage in the cloud will become an increasing necessity for these workloads. Is this even possible and what options are there?
Public cloud providers have capabilities that allow us to spin up a compute instance that has an operating system and network connectivity very easily. In many cases, you can attach storage block devices to the instance, providing the basis for local storage. (Some providers include a small amount of "instance storage" as well.) Then you can install, build and run your applications in the instance.
If you are not running completely in the cloud, then you will have to upload your data set to the instance. When you're done running the application(s), you can copy the data back or move it to more permanent cloud storage. (In the case of Amazon this could be S3 or Glacier.
But this scenario is limited to storage that is associated with the instance (server). What happens if we need shared storage in the cloud itself as we usually do in the HPC world?
There are many times that people or applications need shared storage. The idea is simple—there is one storage pool that is shared by a number of compute resources. In the case of systems in your data center, the centralized storage is shared by a number of servers. The most classic form of shared storage is NFS (Network File System), and in the Microsoft world there is CIFS. The configuration for both approaches is basically the same: there is some centralized storage with at least one server that is connected to a network (typically Ethernet). Client systems then "mount" the storage and immediately have access to the data on the central storage.
The concept has worked well for many years and is still heavily used today. It allows servers to share files so that several applications can read the same data set. The shared storage approach means that the files are in one location so that you don't have to copy the data to each server that needs it. This saves on space and network bandwidth, and it also makes data management much easier.
On the other hand, if for some reason the centralized storage fails, then every server loses access to the data. But with a good design and processes this problem can be overcome to a great extent.
The issue we should all think about as we move to the cloud is how do we create shared storage in the cloud? Regardless of the answer, the shared storage in the cloud will be constructed from the same components as shared storage not in the cloud: storage, servers, networking and software (always back to the fundamentals). Let's examine these components and solutions that utilize them by focusing on Amazon Web Services (AWS).
Brief Amazon Review
Before jumping into shared storage concepts, I want to quickly review Amazon Web Services (AWS). Let's start with the compute/networking side of the shared storage solutions. Recall that Amazon has several different compute instances (Amazon EC2) which are effectively servers and networking. The compute power, the general performance, the amount of memory and the network connectivity of the instances varies.
AWS Instances (servers and networking)
If you are not familiar with AWS you might be surprised to learn that Amazon has a number of different instance types. You can think of instance as a VM with an operating system and some applications. An easy example is a version of CentOS running in a VM with a number of tools installed.
AWS has a wide array of instances that have features and capabilities that can be exploited for solving problems. The list of instances gets updated frequently but here is an abbreviated list of instances and their major features when I wrote this article:
I listed just a few of the features of an instance so you could get a feel for some of the differences. There are many other aspects to selecting an instance such as selecting "cluster" instances, including Enhanced Networking, which has lower latencies and lower network jitter (uses SR-IOV), and cluster networking which provides high-bandwidth and low-latency networking between all instances in the cluster.