The Enigma of Benchmarking Clouds
Like it or not, cloud storage and computation is a consideration for many applications. How appropriate this is, is a question to be addressed in another time and place. The question we will address here is how do you compare various cloud providers? You could, and indeed should, look on the Internet at available offerings, talk to the sales and sales support team, and narrow the field down to a few providers that can meet your operational needs, business requirements for availability, and budget.
That then leaves one thing to differentiate the selected vendors -- performance. We all know benchmarks are difficult, and benchmarks can be gamed, but it is important to both your users and those under consideration to fairly compare vendors to benchmark them. In addition to the benchmark process comparing vendors, you might find as part of the process that you do not have enough network bandwidth to support cloud applications. For example, if the performance of all the vendors is about the same, maybe the limitation is the network between you and the vendors. I am not saying that is always the problem, but it must be considered.
The best benchmark tool I have found is Yahoo Cloud Serving Benchmark (YCSB). Before we cover the technicalities of the tool, there are some preliminary considerations to bear in mind.
What to Consider
Before embarking on a benchmark effort, you must define the scope of what you are benchmarking. What applications, how often are they used, do you need to consider time of day (often some activities have load peaks during specific times of day, such as email at the beginning and end of the work day), and specifically, how much of the service offering are you going to use? Benchmarking will not consider security and other issues -- just the performance. Then, I believe it is important that you ask the vendors if they are running the benchmarks on the same hardware and software stack that you will be using. It is important not to let them benchmark on hardware and software that is different from what you end up using.
The next steps are to determine what applications are critical to your operations. Is it email, is it a database, or is it a search engine problem that uses Hadoop? Also, consider that the cloud provider might port your current application to an application it supports. This complicates the situation, as the vendor might not want to do the port for a benchmarking effort without charging you. Another area that must be considered is growth and peak demand. One of the large selling points of outsourcing to a cloud is to address the issues of peak demand and increased workloads with limited impact on performance. In my opinion, loads should be varied to ensure the response time is nearly the same as the load increase. You might be paying for the increased load, but it is good to know that the cloud vendor can meet increased load, and it is also good to know that your network will not be the bottleneck. This is a critical consideration, given that with clouds you are trading off local network performance and latency for cloud network performance and latency.
The Benchmark Tool
It is important to benchmark any system you are considering. A deep knowledge of the benchmarking tool is critical to understanding a cloud service. A detailed presentation about the tool can be found here, along with the code. This is a good starting point to better understand the goals and objectives of the tool as well as what the tool can and cannot do. Additionally, a group at Carnegie Mellon University in the Parallel Data Lab has extended the benchmark with additional features called YCSB++.
As you can see, this is a complex application that generates and executes workloads based on the required applications, and outputs performance data for analysis and evaluation. Understanding and comparing the performance analysis metrics after benchmarks are run between cloud vendors is the critical issue, but as stated, ensuring you have enough network bandwidth must be the first priority in understanding the performance data. Here are some of the additional features found in YCSB++:
- Parallel testing using multiple YCSB client node
- Weak consistency testing
- Bulk-load testing
- Table pre-splitting for fast ingest
- Server-side filtering
- Access control
- Integration with monitoring tool
Given these additional features, I believe YCSB++ has some significant advantages over YCSB, especially in the area of reporting. The key issue is how to develop a workload similar to your local workload, and then how to translate that into the benchmark tool. You might ask, how do you figure out what the workload requirements are? That is the $64,000 question. There are no tools available that I am aware of that allow you to evaluate your workload and then generate a benchmark from your workload. That means you have a lot of hard work in front of you to determine what your workload is and how that would translate into the workload on a cloud.
This is a near-impossible task. Unless you have a sizable group of experienced performance analysts, you are not going to be able to get an accurate picture of your workload. Analyzing workload has always been a problem, but it is now far more complex, as you must look at not only your servers but also at all of the systems in your environment if you are going to outsource to a cloud provider.
Figuring out who is running which applications and their impact on storage and networks is no easy task. You could put monitoring software on each system, but then you must combine the information across your whole environment and figure out who is running what and when, and for how long each of the applications are running. Who has the time and staff with the expertise to do this -- not many, I am sure. If you do have the staff and you are planning to outsource to the cloud, this will likely be their last task. Who is going to stick around to figure out the workload so you can create a benchmark to run on cloud providers that will enable you to outsource your IT department?
We all know the answer to that -- no one or someone who cannot find another job quickly. So we have a great tool for developing cloud benchmarks but no really good way to know what to benchmark based on your needs. It is not always the case that what is needed is guess work, or what one of my mentors' used to call EJ (engineering judgment). When you outsource most, if not all, of your engineering, staff hit the road, forcing you to either outsource to the cloud without doing any benchmarking or potentially have a benchmarking effort, but it likely does not match your real workload, as you do not have any information on that workload.
This, of course, is an IT version of a Catch 22. As usual, there are no easy answers, no simple solutions, and what I find generally happens is everyone over-resources to make sure users do not complain. This ends up costing money and defeats the reason you were outsourcing in the first place.
Henry Newman is CEO and CTO of Instrumental Inc. and has worked in HPC and large storage environments for 29 years. The outspoken Mr. Newman initially went to school to become a diplomat, but was firmly told during his first year that he might be better suited for a career that didn't require diplomatic skills. Diplomacy's loss was HPC's gain.