Tuning Your RAID Controller for Maximum Storage Performance Page 2
Tuning RAID Cache Block SizesThe cache block size is the minimum amount of data that can be read into the cache. For example, a RAID allocation on a disk might be 32 KB and you would think that all I/O to and from the disk is 32 KB, but if the cache block size is, say, 4 KB, then the minimum read or write to that device is 4 KB. This is eight times today's disk sector size. If your file system allocations are large and your write requests are large, then having a small cache block size likely reduces the performance of the RAID, as most RAID controllers I have seen slow down with smaller block sizes because they do not have the CPU power to manage all of the blocks. This might become less true as the next crop of controllers come out with much higher performance CPUs, but having small cache blocks is necessary when data is not aligned to the allocation within the RAID of a single disk.
Take the case where you write in small requests and read in large requests and the file system allocation is as large as the RAID stripe. If that is the case, there is a likelihood that the file system is not so heavily fragmented that multiple writes will be sequentially allocated and read-ahead will likely help. Read-ahead will also help if the writes are bigger than the reads, as all RAID controllers will see the smaller reads as sequential. So when tuning for reads, you need to understand the read request size compared to the write request size and determine how many files are being written at the same time. If the answer is one write at a time, then data will likely be allocated sequentially unless the file system is fragmented, and read-ahead will provide great benefit. On the other hand, if there are multiple files being written and the write size and the allocation in the file system are less than the stripe size, then read-ahead will provide little to no value. It comes down to this: read-ahead works with writes and allocations equal to or greater than the stripe size of the RAID if there are multiple files being written.
Tuning Cache for MirroringWrite cache mirroring is a common feature in many midrange RAID products, and all writes are mirrored in enterprise controllers. The controller takes the I/O request and writes it to the cache on the other half of the controller in case the part of the controller being written to fails. There are some vendors that have techniques for bypassing write cache mirroring requirements in the controller if the data is aligned on a full stripe, but in a general-purpose environment with write cache mirroring, each write is written to cache and then written to the other cache before the acknowledgement is given to the I/O request. Write cache mirroring therefore generally slows performance because of the latency and bandwidth requirements for the write to the other cache, and each cache must mirror the other so often you lose half of the cache space for mirroring the other cache.
If the vendor has tunable parameters for read and write cache, tuning these based on workload and reliability requirements is something to be considered. The question I often hear is whether users should be using write cache mirror or not. The answer depends on how much data reliability you want. Let's say you are writing a file and write the data to the cache on a non write cached mirrored system. At the same time, the whole backend of the controller (from cache to disk) fails. At this point, your application has been told that the write was successful, but it never got to disk. Obviously, the chances of this happening are slim, but it is possible and I have seen it happen. If you did another write to the same file, you might get an I/O error, as most RAIDs when they realize that they cannot write from cache to disk cause the error, or the RAID controller might failover to the side that is still working and your write would complete normally, but the file is missing a write and the application doesn't know it. Missing a write in a file is not a good thing, which is why write cache mirror is on by default. Tuning for write cache mirror involves figuring out how much cache space you want to save for writes, and write cache mirror should be on, as silent data corruption is something you just do not want to deal with no matter how low the odds are. Finding the bad data or lack of data if the controller fails is next to impossible.
Tuning for RAID controllers isn't that difficult if you understand a bit about the application load based on what the applications will do with the RAID. Read-ahead is often not useful if there are multiple files being written and the file system allocation is small. The best example of this bad situation is NTFS on Windows. For file systems with large allocations, if they are as large or larger than the RAID stripe, then read-ahead will have a significantly positive impact.
Henry Newman, CEO and CTO of Instrumental Inc. and a regular Enterprise Storage Forum contributor, is an industry consultant with 29 years experience in high-performance computing and storage.
See more articles by Henry Newman.
Follow Enterprise Storage Forum on Twitter.