Benchmarking Storage Systems, Part 2 Page 3
What About Software?
Even if you’re going to benchmark a file system or shared file system, much of what was recommended for the analysis of the hardware should be done for the software. One big difference is obviously you cannot write to the raw device if you are testing a file system.
Benchmarking a file system is likely to be the most difficult benchmarking task because there are so many variables, and doing it correctly is very time consuming, both for you and for the vendors.
Here are some items that must be characterized as part of the process for benchmarking file systems, building upon the characterizations already done for the hardware:
- File system size – current and future
- Total number of directories – current and future
- Total number of files – current and future
- Largest number of files per directory – current and future
For shared file systems, add:
- Amount of I/O from each client and the master machine
- Amount of metadata I/O from each client and the master machine
- Number of clients
- Types of clients
Along with this you have the hardware topology, including HBAs, switches, TCP/IP network for metadata, and possibly tapes, as most shared file systems have an HSM (Hierarchical Storage Management) system built into them.
Developing the scripts, codes, and methodology to do this type of benchmarking is hard work, but while hard on your end, for the storage vendor it will be virtually impossible, as most have limited relationships with shared file systems vendors, limited server resources, and limited staff that know shared file systems.
Often what this type of benchmark becomes is really a benchmarking of the benchmarker, not a benchmarking of the software and hardware. The vendor who often wins is not the one with the best hardware and software, but the vendor with the best benchmarks. Therefore, it’s important to give the storage vendors as much information and guidance as possible.
The most important part of a file system benchmark that is often forgotten is creating fragmentation as part of the benchmark. Most file system benchmarks create a new file system and run the benchmark tests with tools such as iozone and bonnie. Most of the time this is not really valid given that on a real file system users' files are created and removed many times, and multiple files are often written at the same time.
Some of the areas to look at are:
- How many applications are doing reads and writes at the same time?
- How many files will be created and deleted within the file system?
- How full will the file system be over time?
Each of these issues will have an impact on the benchmark that you create, regardless of which tool you use.
The process of developing a benchmark that mimics your operational environment — those are the key words. The process of determining what the characteristics of your environment is the first step. Workload characterization, though seemingly a difficult process, is not really that difficult when separating it into the various parts of application I/O, file system configuration, system and file system tunables, and hardware configuration requirements. It’s also important to keep in mind how the new system will be used as compared to how the old system was used.
Next time we will review the process of packaging, rules, analysis, and scoring.