Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Using Your Workload
If you are going to use your actual workload in a benchmark, the first step is end-to-end hardware and software characterization. You need to document and understand:
- What applications are being run
- The number, location, and sizes of the data sets being used
- The server(s) hardware configuration, including CPUs, memory, NICs, and HBAs
- The server(s) software configuration
- Application requirements, such as redo logs for databases
- File system and volume manager settings
- HBA tunables
- Storage configuration, including LUN sizes, RAID type, and RAID cache sizes and settings
All of this may seem obvious, but if you’re going to give a storage vendor your benchmark, the more documentation that you provide them with the more likely the results will meet your requirements and the fewer questions you will have to answer. And if you’ve gone so far as to document the above, then creating the operational procedures for things such as remote mirroring, tape backup, and other operational requirements will not be difficult.
When using your own workload in a benchmark, there are several additional areas that need to be clearly understood and documented for the storage vendors, including:
- Server memory size and tunables settings – Many file systems use memory for the file system cache or the cache for the database based on system tunables or auto-configuration. If a vendor does not have the same amount of memory and use exactly the settings that you are using or the other vendors are using, that vendor's results could be skewed
- File system and volume manager settings – These settings will have a significant impact on the performance of your system, and because different settings could have a significant effect (positive or negative) on performance, they should be set the same for all the vendors
Emulating Your Workload
If you have a staff that can program in C, then writing the code to emulate your workload will not be that difficult. I believe that if you have done a good job with the emulation, then you’ll have a great deal more control of the benchmark in terms of scaling, and you’ll have a far better understanding of what your workload does to the actual hardware and software.
It also allows you to test the storage vendors’ hardware without the file system, as you can write/read directly to the raw devices. This allows a better understanding of the hardware that might otherwise be masked by the file system's effect on I/O performance.
The steps for developing an emulation are relatively simple:
- Use the system tools to get a system call listing of the application(s) doing the I/O. These tools are available from most OS vendors. For example, on Solaris it’s called truss, and on Linux it’s strace
- After collecting this data you’ll need to develop some statistical analysis of:
- Read and write ratio
- Read and write sizes
- File sizes
- Seeks and seek distance
- The amount of concurrent I/O
- Number of open files
- System call type (asynchronous or synchronous I/O)
- Develop a program that reads and writes with the formation developed in #2, writing and reading to/from the raw devices
This seems fairly difficult and can be, but once you have completed the process, you’ll be able to easily scale your workload up and down. Another advantage is that when the vendor receives the benchmark information, you will be receiving from them a true benchmarking of the actual storage hardware, not the file system and volume manager tunables.