I am not the first nor will I be the last to write about picking the right cloud storage vendor for your company. Clearly, it’s a popular topic.
There are likely a few reasons that everyone wants to address how you pick a cloud provider. The obvious reason is that a vendor is marketing their product via writers being paid to write articles about cloud technology. Needless to say there are lots of articles – some good, some bad – about choosing the correct cloud storage and applications environment.
My approach, as it usually is, is not to discuss products but to discuss the requirements. That way, readers can pick and choose what is important to their operation. No vendor is paying me to write about their cloud.
In the 1990s there was saying about RAID. You could have it cheap, you can have it reliable, you can have it fast. Pick any two.
I think that the same could be said at this point for the cloud market. Except the list is quite a bit different and longer and way more complex, but so is the technology. These are the list of issues that must be addressed:
(They’re are in no specific order; their relative importance will depend on your requirements.)
• Speed of data access and QoS (quality of service)
• Amount of data to be stored
• Cost, including cost to move to another vendor
• Data integrity and provenance
• Availability for your applications and data
I think that each these areas are important in terms of helping your better understand if you’re going to make the move to the cloud.
One thing that I cannot emphasize enough is to get what the vendor tells you (or says in marketing literature) in writingas part of the contract.
I have firsthand experience with cloud vendors putting one thing in their marketing literature about reliability and integrity and refusing to put the marketing features into a contract. If it’s not in writing in the contract, it doesn’t matter what the marketing claims are, unless you want to spend a protracted time in court. Not worth the cost of lawyers.
Speed of Data Access and QoS
Depending on your business, the speed of access is likely going to be an issue. But equally important is the quality of service.
Let’s say you are in a retail business and you depend on black Friday for your business profitability and this is your peak usage. Getting required performance 99.99% of the time might sound like a good deal, but that is about 52 minutes per year that you will not get the speed you wanted or the QoS you paid for and need.
I suspect that many businesses will need their peak performance on that day and from what I understand large retailers have agreements for credit card approval codes that address around exactly this type of problem.
If you have crunch times you need to make sure that you specify what your needs are during those times and get in writing something you will be able to live with. If they will not put it in writing walk away.
You of course need to be willing to pay the vendor to meet this requirement. Nothing is free and if they do not meet the requirement and your business is impacted, what are the penalties? Something to think about.
Amount of data that needs to be stored
The amount of data you need to store will likely be the largest cost item.
You need to run the numbers and make sure that the cost is in line with your expectations. Does the cost change significantly if you need more storage for a short period of time? Does the cost change if you increase your access rate significantly?
All of these are reasonable questions. Make sure you not only get the pricing on these items but also understand the limits. What if you need have 2x the access rate, can the vendor support that whenever you want? Do you have to pay the potential access rate or size change?
Some of these areas are well documented by some vendors and some are not. Get the costs and the cost ranges before you sign up, or you might get some unwanted surprises.
The Cost to Move to Another Vendor
Cloud is emerging technology, so migration from one vendor to another is more of an unknown. The cost and time to move from, say, RAID vendor X to RAID vendor Y – and of course the potential methods – are pretty well known. But it took many years for the technology to mature to where it is today.
How do you move from cloud vendor A to B? Cloud vendor B could say “Okay we can do this,” but at what cost?
Cloud vendor A might have some reservations about Cloud vendor B accessing the data in their cloud. Additionally, what is the fine print in the contract? Some vendors have some significant costs for you to move data out of their cloud if you want to do it in a reasonable amount of time.
Some of the issues might be out of your staff’s control and the vendor’s control. For example, what about the time required to move the data? Moving 100 TB of data between vendors might take a very long time if you have an agreement that says you are only going to access say 20 GiB of data per day.
So you get a TB every 5 days and get to move your data in just 500 days. Not very practical is it?
When your data store gets large, moving your data around can get very time consuming. Remember an OC-192 channel – which is not cheap – is only about 800 MiB/sec of bandwidth. About the same speed as a single FC-8 or SAS 6 Gbit port.
The performance of your local network and your local RAID controllers is going to be far faster than moving data around clouds between vendors. This is because WAN network performance has not kept up with the performance of local storage – and this won’t change anytime soon.
Data Integrity and Provenance
If any of you have read my column over the last 10+ years I have consistently talked about the need for end-to-end data integrity.
The problems with data integrity, with data being stored and accessed in a cloud as compared to data being stored and accessed locally, is a large problem.
The potential for silent corruption increases exponentially given the amount of hardware, software and distance between you and your data. TCP/IP checksums, which are the basis of the Internet, are not getting more robust with the increase in networking performance.
It is important therefore to ask about the end-to-end data integrity when accessing your data in the cloud. I highly recommend reading about LOCKSS (Lots Of Copies Keep Stuff Safe).
I am not suggesting you need to use LOCKSS, but understanding why LOCKSS was created and the problems it is trying to solve will give you a good understanding of the risks involved in keeping data a long way from your applications.
Yes, of course you could move your applications to the data and solve some of the issues, but that brings up the issue of provenance. Look at provenance as a chain of custody for your data. Is this the data I put into the system, who has looked at it, and what are the changes and who made those changes and when? These are just some of the examples.
Within your own network, dealing with these types of issues are somewhat difficult but likely tractable. When you give your data and potentially your applications for someone else to run, this becomes a huge problem for some.
Healthcare records and SEC compliance data are just a few examples of data that need at least some level of provenance. If you are a doctor’s office and move your data and applications to a cloud and you have some celebrity patients, how do you ensure that the cloud vendor’s people are not looking at your data and selling stuff to the National Enquirer?
Data encryption, key management and alike are just some of the areas you need to be concerned with.
Availability for your applications and data
You need to clearly define the availability needs you have to any vendor. Having your data available but not the applications – and vice versa – is not good for business.
Availability needs to be clearly defined as not just data and not just applications but whatever your workflow is. Clearly specify a number of 9s and downtimes that you can live with.
This goes back to my above comment on peak usage. If your business is moving to a cloud and the vendor has 1,000 other businesses of the same type with the same peak periods moving, do they have the infrastructure to deal with all of you at your peak?
These are clearly very complex issues and you need to protect your business from a cloud vendor’s lack of engineer rigor from a business perspective.
Choosing a Cloud Storage Vendor: Final Thoughts
I am sure that there are many other areas to think of that are specific to a business area or a cloud vendor’s current offerings. Moving from local storage and applications to a cloud needs to be clearly thought through. Again, specific language must be put in your contract to hold the vendor responsible to meet your needs and requirements.
In some ways, thinking about things locally was significantly easier, but we all learned the issues over years of time. It is going to take years for people to get a good understanding of how to do the same thing for the cloud, both on the cloud vendor side and the user side.
In the mean time: caveat emptor.