The Importance of Understanding Application IO - Page 2
Right away, using the strace output, you can see the amount of data the application is sending to the operating system per write function. If you like, you can think of this as the "data size" or "write size" from the application (the same applies to reads as well as writes). You can scan a complete strace output file and then get a range of write sizes for the application. In fact, you can plot a histogram of these as shown in the example below which plots a histogram of the write sizes in the byte range (0-1KB).
Figure 1: write size histogram example
This histogram can be very useful when discussing a storage solution with vendors. You can show them the histogram and explain that the information comes from an strace of the application. Then you can ask them to do the same thing or at least provide the same information for the tests/benchmarks they ran. If the patterns are similar, then their tests/benchmarks are perhaps useful relative to your application. You don't have to match the entire histogram either, you can just match snippets or sections of it.
You can also take the amount of data in each write function call and divide by the amount of time it took to complete. This gives you the throughput of that function call. This can also be useful information, although the merging of write or read calls as well as caching can affect the time. But at least it gives you some sort of throughput information (there are other sources of approximate throughput information for an application but that's another long article).
Would You Like Your Data Sequential or Random?
In addition to the size of the IO system calls, another very important aspect of an IO pattern is understanding if it is a sequential or random data access pattern (or a combination). Sequential access means that your application accesses a certain piece of data in a file (read or write), and the next access proceeds from where the prior access ended. There is no movement of the file pointer in between data access via an lseek() function or a rewind() function or something similar. A simple example of this is to read 4 KiB from a file and then read the next 4 KiB in the file and so on.
A random access IO pattern accesses some data in a file and the next data access is not the next chunk of data but some other data chunk within the file (ideally, a random location). A simple example is to read 4 KiB from a file, then move the file pointer to a point of 18 KiB within the file and read another 4 KiB. This example skipped the data from the end of 4 KiB to 18 KiB before doing another IO operation and 18 KiB is not a multiple of 4 KiB making the file pointer location a little more random. If you do this pattern in a seemingly random pattern then it is considered a random file access IO pattern.
You can use strace information to examine the sequential or random nature of data access. You just scan through the strace file at zero. After a 4,096 byte read of a file the file pointer is at 4,096 (the strace output for a read() function tells you how much data was read in bytes). An lseek operation tells you the final file pointer location in bytes. The same is true for other functions that modify the current file pointer. Then you just plot the file pointer location as a function of time.
If the file is accessed purely sequentially, the plot of a file pointer over time should have the file pointer value increasing in a monotonic fashion for longer stretches of time. But there may be times when the file is closed and reopened which will set the file pointer back to the beginning (a value of zero). A common example is reading an input file where you read it from beginning to end so the file pointer plot should just be an increasing value.
On the other hand, if the file pointer value jumps around as a function of time then the file access IO pattern is probably random. However, try looking closely at the file pointer plots since sometimes there is some underlying pattern giving it some sort of sequential pattern.
I think you might be surprised by your application profiling. You might find that applications do a fair amount of sequential IO. But just remember that production systems are running lots of applications that may be doing IO at the same time. As a consequence, the storage system may think it's being hit with a bunch of random IO.
A third aspect of IO patterns that people discuss is IOPS (I/O Operations per Second). People like to debate what an IO operation is, but I personally think of three IOPS metrics: (1) Read IOPS (read() operations per second), (2) Write IOPS (write() operations per second), and (3) Total IOPS (all IO operations per second). This last measure, total IOPS, takes into account reads, writes, and, in my opinion, any IO operation.
When testing storage hardware for IOPS, many times people define a read or write IOPS to be a 4KB data size because this is typically the smallest write or read function size that an OS produces. It is also the default buffer size for Linux glibc (look at stdio.h). But the 4KB size is not a standard by any stretch.
You will find vendors who run IOPS tests using 0KB size data sizes—that is, no data is actually written to or read from the disks. I've also seen vendors use 1 byte or 1 KB for data sizes.
Another frequent benchmark faux pas is to run one test that combines reads and writes in some ratio. A common one is something like 25 percent reads and 75 percent writes. The goal is to provide a single IOPS number for applications that do both reading and writing. It is a laudable goal, but it does make life very difficult in applying that performance measure to applications that don't do IOPS in that ratio. This limits the usefulness of these measures.
Strace can provide IOPS pattern information. You simpye count the number of specific IO operations in a given second interval to get that information. This allows you to compute the Read IOPS, the Write IOPS, and the Total IOPS very easily.