Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Enterprise boxes have the ability to create a mirror, break the mirror for a backup, and then reattach the mirror and update the box. Most mid-range boxes do not have this feature, although it is starting to become more common. Backup is becoming more difficult given the total amount to storage that needs to be kept. More and more often, I am seeing sites move to hierarchical storage management (HSM) techniques rather than backup, given the immense data volumes. See my article on tapes for additional information on this topic. The issues for backup will be addressed in a few months.
You need to determine what RAID level you are going to need to use. There are many different RAID levels, but the two used most often are RAID-1 and RAID-5. The RAID level depends on:
- Cost -- For the same amount of data storage, RAID-1 requires far more disks than RAID-5.
- How the data is used -- If you are making small random requests, RAID-1 is faster than RAID-5. In general, if you are making large sequential block requests, RAID-5 is faster, especially when you have a large number of sequential writes.
So, if you are making small requests (especially if they are random), RAID-1 will be much faster than RAID-5. With RAID-1 each device is mirrored, but with RAID-5 you create a LUN (Logical Unit Number) with a parity drive so that the LUN can be rebuilt if a device fails. With RAID-1 you write far more data (as each disk is mirrored) than you write with RAID-5, resulting in far more backend bandwidth for sequential I/O.
File Sizes and Accesses
In today's world, the likelihood of everything fitting into the RAID cache is very low; in fact, why would you even buy a RAID if that was the case, as you could simply purchase an SSD (solid state storage). The real questions you need to ask yourself are how are these files accessed and can the data be reused. This leads to the next important question -- would a large cache help or not make any real difference?
The choice between cache-centric RAID devices and storage-centric RAID devices likely will be made by budgetary constraints and other issues rather than the performance of the devices. The operational environments of the two types are often vastly different, with issues to consider including:
- How many LUNs and how much storage do you want under the controller of one device
- What RAID levels do you want to run based on the cost per MB
- What type of disk devices do you want
- What RAID levels are going to provide the best performance given the applications
- The application types and request sizes
In the next article, we will dig deeper into RAID and discuss how the layout of RAID and file systems should be architected.