A Trip Down the Data Path: RAID and Data Layout Page 3


Want the latest storage insights?

Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure

Data Files

Going through the same set of steps is necessary for the actual data itself, but there are often numerous differences between how the data is accessed and how the index files are accessed. The process, however, is the same. Here are the steps:

  1. Determine how to lay out the LUNs based on:
    1. The file system and volume manager that you plan to use. Think about allocation sizes, round-robin, and striping
    2. The size of the files. Think about how the files will be allocated on the physical disk devices and how the RAID cache will be used

  2. Determine the I/O request size that will be used to read and write the data
    1. If it is large, then RAID-5 might be a good choice, as with RAID-1 you:
      1. Have to write out more data using more cache to disk bandwidth than with RAID-5
      2. Use more disks for the same amount of data space
    2. What amount of new data will be created (read/write ratio for setup of the cache)
If you understand the above information, you will be able to create the LUNs and set up the file system and/or volume manager based on the LUN creation. Of course, you will still have to tune the database internals, but in terms of I/O on the storage, it will be as efficient as it can possibly be. I define efficiency as the highest possible cache utilization and the lowest possible data latency.

This process can be used for any other application type and is not restricted to databases.


When I started this column late last year, I stated that the key to I/O performance is understanding the data path from end-to-end. Through the series of articles culminating with this column, I believe that we have completely covered this end-to-end understanding. RAID controllers have no knowledge of how the data will be accessed or how the files are mapped to the physical devices, yet they have built-in algorithms to cache the data, improve the I/O latency, and reduce the amount of I/O from disk to cache.

It is up to the architects, storage team, and/or administrators with knowledge of the volume manager and file system to assist the RAID controller with its caching algorithms. The RAID and the volume manager and file system do not communicate nor play well together, as they have no real communication. Maybe that will change in the future, but not for some time.

Now that we have completed the data path, next month we will start reviewing the "hows and whys" of benchmarking. If anyone has any suggestions, please let me know.

» See All Articles by Columnist Henry Newman

Submit a Comment


People are discussing this article with 0 comment(s)