Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
This is the second in a three-part series on benchmarking. Part 1 examined each of the components that might be included in a typical benchmark. Today we’ll look at developing representations of your workload as well as the pros and cons of using your applications and real data in the benchmark as opposed to developing emulations of both.
The most important part in the development of a set of storage benchmarks is ensuring that the benchmarks represent your current workload and how you run that workload on the system(s). With an understanding of your current workload, you can begin to predict how the current workload relates to the future workload.
There are many ways to create workloads that represent your real work. Some are more difficult for you to create, while some are much harder for the storage vendors to run, and of course some are halfway in between (I know, sounds a bit like Goldilocks and the Three Bears). Whatever you chose, you need to completely understand the tradeoffs and ensure that your organization understands the advantages and disadvantages of the decisions to be made.
Going into the development of a benchmark, you have some hard choices to make. The first is whether the benchmark will utilize your real applications and real data, or will an emulation of the workload(s) need to be developed.
Each of these two paths has pros and cons for both you and the vendor. You might wonder why I seem to be so concerned for the vendors. Besides once being a vendor and a benchmarker, I have come to realize that there is no free lunch when doing benchmarks. In other words, if you develop an expensive benchmark, quite often the cost of the benchmark will more than likely be rolled into the bid price of the hardware depending on the size of your procurement and how much the vendor(s) want your business.
Here’s how I see the tradeoffs:
|Use actual applications code||This is the best measure of your real work if you structure the benchmark correctly. For database benchmarks, using the actual data may have security issues for your company, but using actual data will result in the most realistic benchmark||These types of benchmark generally have more setup time and are more difficult to run for the vendors, as a great deal of application tuning, file system tuning, system tuning, and RAID tuning may be required, all of which can affect final pricing|
|Develop a set of representative storage benchmarks||Far easier to run and to scale to larger workloads. Running this type of workload is also far easier for the storage vendors||Sometimes difficult to develop characterization of the workload given system tunables, file systems, volume managers, and the actual storage hardware|