The recent EMC-NetApp benchmarking controversy got me thinking: Hardware isn’t the only benchmark category that could use an overhaul. File system benchmarks can be just as misleading (see Benchmarking Storage Systems).
Hardware can be manipulated by clock speed, memory, bus, bios, compilers, and so on. But with file systems, you have even more variables and thus more options for manipulation.
I have seen a plethora of benchmark results comparing file system A to file system B. I won’t mention vendor names, but I have seen plenty of claims lately from vendors that their file system is faster than other file systems. I have read some of the benchmark documentation and they usually have little to do with the reality of usage patterns. As someone who once ran benchmarks for a hardware vendor, I understand that benchmarks are about marketing. If it is an official procurement, you read the rules for the benchmark and take every advantage of mistakes and loopholes to shine the best possible light on your product.
So let’s take a critical look at some of the file system benchmarks, and hopefully you will use this information to let the companies doing the benchmarks know that we want benchmarks that reflect real-world scenarios.
Benchmark Tricks and Treats
We’ll start by dividing benchmark types into “easy” and “hard.” By easy, I mean that the setup is easy and the testing is easy and the results are not very useful. Hard is the opposite of easy — and the only way to produce meaningful benchmarks.
Easy file system benchmarks include:
- Write/read performance testing just after file system creation: This type of testing is often done with a few optimal block sizes that match the underlying storage configuration design.
- File creation testing just after file system creation: For some of these types of tests, pre-creation of file system metadata is done and files are created in an optimal way in terms of sequential creation in a directory and an optimal number of files per directory.
- File removal testing just after the file creation test: To remove files quickly, the files can be created in a sequential order.
Easy file system benchmarks are are almost always done for two reasons. Vendors can implement them without huge expense and time, and they are easy to explain to the public and marketing and sales staff.
Since hard is the opposite of easy, everything takes longer, costs more and is harder to explain. So using the same three examples, here is what they would look like if done the hard way:
- Write/read performance testing the file system to a steady-state level of fragmentation: This takes a long time, since the file system must be created and files are added and removed, performance tests are run and files are added and removed, and tests are run again. This process continues until the file system tests achieve a steady-state rate and additional file system fragmentation does not affect the tests. From a few tests I have run, this can reduce performance by as much as 99 percent over the easy test method.
- File creation testing using random creation techniques and a fragmented metadata area for creation: Creating files in a non-optimal way with a file system that already has a fragmented metadata area is almost never done, but has a significant effect on performance.
- File removal testing based on random creation of files and directories: Removing files that were created randomly often takes far longer than sequential creation.
I believe another reason that vendors almost never do the hard testing is that there is no standardized process that anyone has agreed upon to fragment file system data or metadata. Any vendor attempting this might have the results called into question since there are no standard benchmarks in this area, much less an agreed upon way to do this type of testing, but testing in this way is critically important. Far more important than the easy testing, the degradation in file system performance I have seen in operational systems can results in two, five, or even 500 times difference in observed performance.
One of the most meaningless file system tricks that is used today is to compare two file systems that have different operational goals. One example is a COW (Copy on Write) file system run on a test that allows the whole file or files to fit in memory. The vendor then compares that with a file system that does not use all of the memory to combine writes, but a file system that is constantly streaming data to disk. Not surprisingly, the COW file system goes faster if the file size is less than memory. Now what if the file size was, say, 10 times memory, and if there is a RAID cache and 10 times the RAID cache, if that is large. Guess which file system would likely win hands down. It wouldn’t be the COW file system, since it must now spend a great deal of CPU work finding the oldest blocks and writing them out and memory will be constantly filled with more data. The streaming file system will work far better in that case.
Choose the Right File System for the Job
The original paper on log structured file systems was “The LFS Storage Manager” by Mendel Rosenblum and John K. Ousterhout, presented at the Summer ’90 USENIX Technical Conference, Anaheim, California, in June 1990.
One of the main reasons for creating this file system was they were trying to solve a problem with diskless Sun-3 workstations that crashed and the amount of time it took to do an fsck. The way I see it, they were trying to fix a hardware problem (the systems crashed often) with a software fix by not fsck-ing the whole file system. This was likely a good idea. What has happened since is that everyone has moved to log structured file systems, since hardware does crash and getting up and running fast is important. Another file system, Reiserfs, was designed to address the need for high-speed add removes for Usenet news servers, and many file systems are designed to address one problem or another, since with block-based storage devices it is impossible to pass file topology down to the device and have it be smart enough to address the inherent latency in the data path.
Whenever I look at a file system I try to understand the design goals and what problem they are trying to solve. I look at the hardware technology they suggest and see if it all fits together. Today there are many I/O problems for file systems, from what I am doing on my laptop which can be heavily cached, to my desktop which needs good I/O performance for editing video, to low- and high-speed databases that might or might not be cacheable, to streaming I/O for capturing things like high-speed film scanners, to preservation archives. No matter what the vendor says about file systems, one file system cannot solve all of these problems with high performance. You might be able to solve a number of these problems with a highly tunable file system and an expert on performance and configuration, but surely not all of them.
There is a bunch of hype out there today about file system benchmarks, and from what I have seen, none of the benchmarks match real world problems. I view these benchmarks just like CPU core counts and GHz ratings. There are many other factors that affect real performance that are equally as important as cores and GHz; the problem is that no one is doing the tests and I do not see this changing anytime soon. Before you decide on a file system, understand your requirements and at least look at the benchmark environment and what the file system was doing before you make a decision. COW file systems that test in-memory performance for a laptop are likely a great idea, but they are not fine for testing a 100TB OLTP database.
Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 27 years experience in high-performance computing and storage.
See more articles by Henry Newman.