Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Amazon storage options
Amazon also has several storage options. I won't cover all of them because this article is concerned with performance more than capacity or archive. Let's start with the first one that is generally referred to as "ephemeral storage." This is "pre-attached" storage that comes with the instance for free (i.e. no extra cost). The word "ephemeral" is used because if you stop and end the instance or if it crashes, then the data is gone.
Don't dismiss ephemeral storage because it can be very useful depending upon the situation. Virtually all of the instances have ephemeral storage—even the smallest instance, the m1.small, has a 160GB local drive. As the instances get larger, the number, capacity and performance of the local devices increases. A quick list is below.
As you can see, all of the instances have their own storage that can be used for anything you want. Many of the instances have local SSD storage. In particular, the i2.8xlarge instance has eight 800GB SSD drives. You can get quite a bit of IO performance from this storage (Amazon doesn't publish the performance of the drives in ephemeral storage). Perhaps with the right file system, the SSDs could be used as a cache. Or it could be used outright for file system storage.
There are a large number of instances that have internal SSDs but not always a lot of capacity. At the same time there are instances that have a number of internal drives with more capacity. For example, the m1.xlarge instance has four 420GB drives which can be used to create a RAID-5 volume group with 1,260 GB. There is also an instance, the hi1.8xlarge instance, that has 24, 2TB drives. Using all of them in RAID-6 gives you about 44TB of useable capacity.
A second storage option is called EBS or Elastic Block Store. The concept is pretty simple: an EBS volume is simply a network block device that you can attach to your instance. You can think of it as a "virtual hard drive" if you like. It's not in the server that contains your instance but rather it is probably a volume from a centralized storage system that is serving out block storage via iSCSI. (Amazon doesn't document the details of their EBS storage.)
After attaching the EBS volumes to your instance, you can use them like you would directly attached devices. You can combine them with LVM. You can use software RAID (md) across them to create a raided device. Or you can use both LVM and software RAID to create the underlying storage device for your file system. Then you build your file system on top of the device(s).
Below is a list of basic EBS characteristics (from Amazon):
- The size of the storage volume is variable and under your control up to 1TB in size.
- Volumes are placed in a specific Availability Zone, and can then be attached to instances in that same zone.
- Multiple volumes can be attached to a single EC2 instance.
- There are two volume types: Standard and Provisioned IOPS
- Standard volumes have an IO performance of about 100 IOPS (about the same as a single hard drive).
- Provisioned IOPS volumes have an IO performance up to about 4,000 IOPS. Fundamentally, a Provisioned IOPS volume is designed for up to 30 IOPS per GB.
- EBS volume data is replicated across multiple servers in an Availability Zone to prevent the loss of data from the failure of any single component.
- If the instance crashes or you stop and terminate the instance, your data is still in the EBS volumes (you have to explicitly stop and terminate the EBS volumes to avoid charges).
- EBS volumes have snapshot capabilities:
- Snapshots are stored in Amazon S3.
- You can use the point-in-time snapshots to instantiate new volumes.
- You can copy snapshots across AWS regions, making it easier to use multiple AWS regions for geographical expansion, data center migration, and disaster recovery.
- You can view performance metrics for Amazon EBS volumes using Amazon CloudWatch, giving you insight into the performance.
- If an EBS volume fails, you can recover the volume from the last snapshot.
- After a snapshot is taken, you can immediately access the Amazon EBS volume data. However, this does not mean all of the data is immediately available (EBS snapshots implement a lazy loading).
- You can specify a larger size snapshot volume that the EBS origin volume.
- You can share the Amazon EBS snapshot by allowing others to create their own EBS volumes based on yours.
- You only pay for the storage and performance that you actually provision.
- EBS volumes are managed with the same IAMS (AWS Identity and Access Management) that you use for instance security (role-based access controls). This includes users and groups.
You can use EBS volumes with any Amazon instance, but if performance is truly critical, then Amazon offers Amazon EBS-optimized EC2 instances. These instances cost more, but they provide dedicated throughput between your EC2 instance (server instance) and EBS. The performance ranges from 500 Mb/s to 2,000 Mb/s depending upon the instance type used. These instances can be used with either Standard volumes or Provisioned IOPS EBS volumes.
I'm not going to cover other Amazon storage options, such as S3 or Glacier, since the focus is a bit different than performance. S3 can be a very useful storage solution but for shared storage it's better to be used in combination with EBS and ephemeral storage.