Why You Need to Understand Your Application IO Patterns - Page 3
Metadata—It's Important Too
One other aspect of IO patterns that people almost always forget is the metadata function rate. Recall that metadata is data about the data and is a very important aspect of storage solution performance. In many ways it is like IOPS, but for metadata and not data (the number of metadata operations per second). Metadata rates are important because some applications can do a great deal of metadata operations during execution.
However, one question that gets debated is what is a metadata operation? Personally, I view a metadata operation as any IO operation that affects the metadata of a file in the file system. This includes any stat() operations, reading or writing data, file creations or removes, changing the dates or access permissions on the file, any readir() information, and so on. But the classic metadata rate metrics are: (1) file creates per second, (2) file removes per second, and (3) file stat() per second.
Again, strace can be your friend in this case. To get an idea of metadata function rates, you simply scan the strace file and count the number of specific metadata functions in a unit of time. These become your metadata rate metrics from the perspective of the application, adding to your IO pattern information.
Impact on Storage Design
If you can estimate all of the metrics I have mentioned so far, then you have the following list:
- Amount of time spent doing IO
- read/write function sizes
- read/write throughput
- Sequential or random file access
- IOPS (Read, Write, Total)
- Metadata rates (file creates, file removes, file stat() at a minimum)
This is quite a bit of information about the IO pattern of your application. Moreover, using all of the information in the strace of the application, you can also compute a very large amount of statistical measures of each of the IO pattern elements.
For example, you could plot a histogram of the read/write function sizes to understand the distribution of these functions. You can also compute the average, mean and mode of them to give you an understanding of the "typical" read/write size. Extending this a little further you can compute the standard deviation around the average, mean, and mode to understand how widely, or not, the function sizes are distributed.
There are other statistical measures that can give you an idea of the distribution of different metrics. Basically, you are trying to describe the IO pattern using the above metrics coupled with statistics. However, the real key is being able to understand and apply these metrics in designing or specifying a storage solution.
There is no science to being able to design a storage solution, even with a large number of statistics about the IO pattern of the application. But there are some general rules of thumb you can use.
A simple one is to check the peak IOPS for the application (read, write, and total). If the application seems to spend a fair amount of time doing IO and the IOPS are fairly large, then you will likely need a large number of hard drives to meet the IOPS requirements of the application. A really simple way to think of it is that a 7.2K SAS drive can roughly do 100 IOPS (it can actually do more but 100 is nice easy number to work with as a starting point). Therefore, divide your peak IOPS by 100 to find the number of drives you will need for the application. Given an individual drive capacity you can compute the overall capacity of the storage solution (don't forget the RAID requirements which increases the number of drives).
If the overall solution capacity is too large, you can also choose to switch to SSD drives in place of the hard drives. But be sure that your application does quite a bit of IO and that it is IOPS driven before you start looking at SSDs as your storage media. Also, be sure to check that you can get enough capacity from the drives to satisfy your overall capacity requirements. I know of one application that had a very high peak IOPS (well over 100,000), which caused the user to start considering SSDs as the storage media. However, the application spent less than 3 percent of the total run time doing IO. Even if the SSDs had infinite performance, the application performance would have only increased by 3 percent. This is probably not a good application of SSDs, at least not a cost-effective one.
Overall the IO pattern information should be able to help the vendors understand how your application(s) do IO and what requirements they need to have to run effectively.
Vendors are on the receiving end of a great number of applications that supposedly do a great deal of IO. Based on these, they need to determine which applications should be tested or benchmarked on a given solution and, perhaps most importantly, what that solution configuration should look like (they can't test every configuration).
Unfortunately, what many vendors have done is to create silly configurations to get the best TPC or SpecSFS scores. The configurations are ones that customers will never buy but the marketing siren call of having the best TPC or SpecSFS score is just too hard to resist. Therefore the vendors run what we call a hero benchmark (the best possible score regardless of the configuration). They run a hero benchmark, post the results, put out press releases, pat themselves on the back, high fives all around, then no one ever buys the configuration they tested.
I don't really blame them because of the large number of possible IO patterns, but they have gone 180 degrees away from focusing on user applications to focusing on benchmarks that probably bear little resemblance to user applications. Moreover, the configurations they have tested will never be purchased by anyone because they bear little to no resemblance to what the applications need.
It is far better for you, the user, to take your IO pattern knowledge to the vendors and explain what you want tested. There may be options for using micro benchmarks to simulate some aspects of your IO patterns on standard test systems the vendors may have configured (don't forget that it costs money to build and operate the test systems). Be prepared to work with the vendors to get the best tests possible on their solutions rather than throw a bunch of benchmarks at them see which vendor can complete the most tests (there are customers who routinely do this).
Furthermore, be sure to give the vendors enough time to run the tests and also try tuning the solution for the given tests. Not only does this help the vendors, it really helps you, the user. You get to see what options the vendors can provide which may or may not help the application better performance or the a lower solution cost (in other words—give them time and keep an open mind).
As you can tell, I usually like to finish my articles with a summary. For this article, the main point can be easily summarized: IO patterns help everyone—the users and the vendors. IO patterns help users understand how their application "do" IO allowing them to understand where there are potential bottlenecks and where there are potential opportunities for improving performance by changing applications. IO patterns help vendors because they have a much clearer view of what users want from their solution so they don't have to run stupid, meaningless marketing benchmarks that have no meaning on configurations that no one will ever purchase.
Using simple tools, such as strace, allows you to get an idea of the IO patterns from the point of view of the application. It's not too difficult to extract loads of useful information including the amount of time spent doing IO, read/write data sizes, sequential or random file access, throughput, IOPS, and metadata rates. All of this information helps you, the user, look for storage solutions that can possibly meet your requirements and it helps vendors hone in on solutions and configurations that can be proposed.
One last point I want to make is that I think that many people overlook the fact that the IO pattern information can also help developers perhaps redesign the IO portion of their application to improve performance. If the file access pattern is too random, perhaps there are some ways to change that to more a sequential access pattern so that they can take advantage of the throughput of the storage media. Or perhaps the read/write data sizes are too small, again penalizing the streaming performance of typical hard drives. This can be changed in the application improving overall performance.
Take at look at the IO patterns of your applications—you won't be sorry.