A Trip Down the Data Path: RAID and Data Layout Page 2
The first thing to note about index files is that they are very often much smaller than the database files themselves. Index files in many databases are two Gigabytes in size, so if the file system supports large allocations and all you have within the file system are index files, you could set the allocation size equal to two Gigabytes and have one allocation per file. Most file systems do not support allocation that large, but you can make them as large as possible. On the other hand, index files, though cached in memory, are often searched with small random I/O requests. Small random I/O requests perform much better on RAID-1 than on RAID-5 with 8+1 or even 4+1.
If you can assign a RAID cache to a LUN and the index files are able to fit within that cache, you can significantly reduce the latency to complete the searches. This is often done with enterprise RAID systems, as they support very large caches. How many LUNs do you need? How should they be laid out within the RAID, and how do you tune the volume manager and/or file system? Consider the following example.
Let's say you need 200 GB of index file space and you have 72 GB disk drives in your RAID. For most RAIDs you have two choices in laying out the LUNs for the six disks (200/72=~3 disk drives, and since it is RAID-1, you need 6 drives) you are going to use for RAID-1. You can either:
- Create a LUN with 6 drives and let the RAID controller manage the striping across the 6 devices
- Create 3 RAID-1 LUNs and let the volume manage or file system manage the 3 LUNs
Things are starting to get complex now, as you also have to determine the internal allocation or RAID block size for each of the LUN(s) that will be created. Let's start with the internal allocation, sometimes called segment size and/or element size. If your database indexes are actually small block random, these I/O are often 8 KB requests, so you want to make the internal block size as close as possible to match that number. This ensures that the RAID is not reading data it does not use, as the RAID reads and writes on these internal block sizes. Additionally, if you turn off the RAID controller readahead cache, assuming that the I/O is truly random, performance will improve. The problem is that I/O is often somewhat sequential, or sequential with a skip increment, so having a readahead cache which encompasses the skip increment will improve performance.
OK, we know what the RAID settings should be, but we still need to build the LUNs. I generally suggest that it is always better to do things in hardware than software. So for most RAIDs, I would let the RAID controller stripe the data, but there is a very important gotcha. The data is now striped across 3 disks (6 total for the mirror), and if you were doing sequential I/O to a single index file, your I/O is not physically sequential until you perform I/O to the first, second, and third disks before you go back to the first one.
So if you were doing sequential I/O, it is no longer physically sequential on the physical disks. Of course, this depends on the stripe size within the volume. This has an advantage if the index files are random searched and nothing is sequential, as it statistically spreads the I/O across the three disks.
On the other hand, you could have three LUNs and use a volume manager or file system to either stripe or round-robin the access to the data. One trick that I have used for volume managers that only stripe data is to set the stripe size in the volume manager equal to the database file size (2 Gigabytes in many cases). You can effectively round-robin the index file access, as each index file will be allocated on a separate device. This works only because the index files are of a fixed size, which is often the case for databases. Volume managers and file systems that support round-robin access will also work. The advantage here is that if the indexes are searched sequentially and the files are allocated sequentially, you can match your readahead cache and cache usage to the index file usage, significantly improving the performance.
I have seen cases whereby using this process resulted in eliminating over 80% of the I/O from cache to disk, which translates into better response time and allows more usage from users.