A Trip Down the Data Path: RAID and Data Layout
This month we are going to put everything together for the entire data path. If you haven't read the earlier articles in the series, it might be best to review them to get a better understanding of the data path and the equipment/software involved. The data path is the path I/O takes across the:
Application Library Operating System Volume Manager and File System RAID
The hypothesis I originally made when this series started was that if you want to tune I/O performance, you first have to understand the entire data path, so this month we are going to cover file system layout with respect to the applications and RAID hardware used (the data path).
Where to Start
You might think the place to start is building RAID volumes, but as you will see, that actually is the worst place to start. The first thing you need to do is understand how the application(s) will be performing I/O and how that will affect the file system and/or volume manager. Factors that need to be accounted for include:
- The number of files within the file system
- The size of the files within the file system
- Request size(s) to the files
- Ratio of read to write operations
- Access patterns of files that will be used the most (for databases, these are often the index files)
Before we get too far along, you might want to review the first Storage and I/O article which discusses applications I/O.
Let's say you have a database application and a file system that stripes all of the data (please see "Choosing a File System or Volume Manager" for a discussion of striped file systems and round-robin file systems). Some of the things you will need to consider are:
- How often are indexes created and/or rebuilt
- The size and number of the index file(s)
- The amount of data in indexes verses data within the database
Keeping these things in mind, you might want to consider having different file systems for different types of data. It is often beneficial to have several file systems. For example, you might have:
- One file system for the index files
- One file system for the data within the database
- One file system for the installation of the database (I suggest this as the database application itself usually has many small files, and creating a file system with small allocations reduces the size needed for the installation and allows the other file system to be used with a larger allocation)
- One or more file systems or raw devices for database logs (these are often not under the file system controller, as raw devices are used to ensure the data is not cached in memory in case the system crashes)
If performance is an issue, then having a number of file systems and tuning each file system and LUN within the RAID will provide the best performance for each type of file and I/O associated with it. Let's step through the "whys" for each file system type using the database example.