This month we are going to put everything together for the entire data path. If you haven’t read the earlier articlesin the series, it might be best to review them to get a better understanding of the data path and the equipment/software involved. The data path is the path I/O takes across the: Application Library […]
This month we are going to put everything together for the entire data path. If you haven’t read the earlier articlesin the series, it might be best to review them to get a better understanding of the data path and the equipment/software involved. The data path is the path I/O takes across the:
Application
Library
Operating System
Volume Manager and File System
RAID
The hypothesis I originally made when this series started was that if you want to tune I/O performance, you first have to understand the entire data path, so this month we are going to cover file system layout with respect to the applications and RAID hardware used (the data path).
Where to Start
You might think the place to start is building RAID volumes, but as you will see, that actually is the worst place to start. The first thing you need to do is understand how the application(s) will be performing I/O and how that will affect the file system and/or volume manager. Factors that need to be accounted for include:
Before we get too far along, you might want to review the first Storage and I/O articlewhich discusses applications I/O.
Let’s say you have a database application and a file system that stripes all of the data (please see “Choosing a File System or Volume Manager”for a discussion of striped file systems and round-robin file systems). Some of the things you will need to consider are:
Keeping these things in mind, you might want to consider having different file systems for different types of data. It is often beneficial to have several file systems. For example, you might have:
If performance is an issue, then having a number of file systems and tuning each file system and LUN within the RAID will provide the best performance for each type of file and I/O associated with it. Let’s step through the “whys” for each file system type using the database example.
Page 2: Index Files
Index Files
The first thing to note about index files is that they are very often much smaller than the database files themselves. Index files in many databases are two Gigabytes in size, so if the file system supports large allocations and all you have within the file system are index files, you could set the allocation size equal to two Gigabytes and have one allocation per file. Most file systems do not support allocation that large, but you can make them as large as possible. On the other hand, index files, though cached in memory, are often searched with small random I/O requests. Small random I/O requests perform much better on RAID-1 than on RAID-5 with 8+1 or even 4+1.
If you can assign a RAID cache to a LUN and the index files are able to fit within that cache, you can significantly reduce the latency to complete the searches. This is often done with enterprise RAID systems, as they support very large caches. How many LUNs do you need? How should they be laid out within the RAID, and how do you tune the volume manager and/or file system? Consider the following example.
Let’s say you need 200 GB of index file space and you have 72 GB disk drives in your RAID. For most RAIDs you have two choices in laying out the LUNs for the six disks (200/72=~3 disk drives, and since it is RAID-1, you need 6 drives) you are going to use for RAID-1. You can either:
Things are starting to get complex now, as you also have to determine the internal allocation or RAID block size for each of the LUN(s) that will be created. Let’s start with the internal allocation, sometimes called segment size and/or element size. If your database indexes are actually small block random, these I/O are often 8 KB requests, so you want to make the internal block size as close as possible to match that number. This ensures that the RAID is not reading data it does not use, as the RAID reads and writes on these internal block sizes. Additionally, if you turn off the RAID controller readahead cache, assuming that the I/O is truly random, performance will improve. The problem is that I/O is often somewhat sequential, or sequential with a skip increment, so having a readahead cache which encompasses the skip increment will improve performance.
OK, we know what the RAID settings should be, but we still need to build the LUNs. I generally suggest that it is always better to do things in hardware than software. So for most RAIDs, I would let the RAID controller stripe the data, but there is a very important gotcha. The data is now striped across 3 disks (6 total for the mirror), and if you were doing sequential I/O to a single index file, your I/O is not physically sequential until you perform I/O to the first, second, and third disks before you go back to the first one.
So if you were doing sequential I/O, it is no longer physically sequential on the physical disks. Of course, this depends on the stripe size within the volume. This has an advantage if the index files are random searched and nothing is sequential, as it statistically spreads the I/O across the three disks.
On the other hand, you could have three LUNs and use a volume manager or file system to either stripe or round-robin the access to the data. One trick that I have used for volume managers that only stripe data is to set the stripe size in the volume manager equal to the database file size (2 Gigabytes in many cases). You can effectively round-robin the index file access, as each index file will be allocated on a separate device. This works only because the index files are of a fixed size, which is often the case for databases. Volume managers and file systems that support round-robin access will also work. The advantage here is that if the indexes are searched sequentially and the files are allocated sequentially, you can match your readahead cache and cache usage to the index file usage, significantly improving the performance.
I have seen cases whereby using this process resulted in eliminating over 80% of the I/O from cache to disk, which translates into better response time and allows more usage from users.
Page 3: Data Files
Data Files
Going through the same set of steps is necessary for the actual data itself, but there are often numerous differences between how the data is accessed and how the index files are accessed. The process, however, is the same. Here are the steps:
If you understand the above information, you will be able to create the LUNs and set up the file system and/or volume manager based on the LUN creation. Of course, you will still have to tune the database internals, but in terms of I/O on the storage, it will be as efficient as it can possibly be. I define efficiency as the highest possible cache utilization and the lowest possible data latency.
This process can be used for any other application type and is not restricted to databases.
Conclusion
When I started this column late last year, I stated that the key to I/O performance is understanding the data path from end-to-end. Through the series of articles culminating with this column, I believe that we have completely covered this end-to-end understanding. RAID controllers have no knowledge of how the data will be accessed or how the files are mapped to the physical devices, yet they have built-in algorithms to cache the data, improve the I/O latency, and reduce the amount of I/O from disk to cache.
It is up to the architects, storage team, and/or administrators with knowledge of the volume manager and file system to assist the RAID controller with its caching algorithms. The RAID and the volume manager and file system do not communicate nor play well together, as they have no real communication. Maybe that will change in the future, but not for some time.
Now that we have completed the data path, next month we will start reviewing the “hows and whys” of benchmarking. If anyone has any suggestions, please let me know.
Henry Newman has been a contributor to TechnologyAdvice websites for more than 20 years. His career in high-performance computing, storage and security dates to the early 1980s, when Cray was the name of a supercomputing company rather than an entry in Urban Dictionary. After nearly four decades of architecting IT systems, he recently retired as CTO of a storage company’s Federal group, but he rather quickly lost a bet that he wouldn't be able to stay retired by taking a consulting gig in his first month of retirement.
Enterprise Storage Forum offers practical information on data storage and protection from several different perspectives: hardware, software, on-premises services and cloud services. It also includes storage security and deep looks into various storage technologies, including object storage and modern parallel file systems. ESF is an ideal website for enterprise storage admins, CTOs and storage architects to reference in order to stay informed about the latest products, services and trends in the storage industry.
Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.