Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
The problems with data integrity, with data being stored and accessed in a cloud as compared to data being stored and accessed locally, is a large problem.
The potential for silent corruption increases exponentially given the amount of hardware, software and distance between you and your data. TCP/IP checksums, which are the basis of the Internet, are not getting more robust with the increase in networking performance.
It is important therefore to ask about the end-to-end data integrity when accessing your data in the cloud. I highly recommend reading about LOCKSS (Lots Of Copies Keep Stuff Safe).
I am not suggesting you need to use LOCKSS, but understanding why LOCKSS was created and the problems it is trying to solve will give you a good understanding of the risks involved in keeping data a long way from your applications.
Yes, of course you could move your applications to the data and solve some of the issues, but that brings up the issue of provenance. Look at provenance as a chain of custody for your data. Is this the data I put into the system, who has looked at it, and what are the changes and who made those changes and when? These are just some of the examples.
Within your own network, dealing with these types of issues are somewhat difficult but likely tractable. When you give your data and potentially your applications for someone else to run, this becomes a huge problem for some.
Healthcare records and SEC compliance data are just a few examples of data that need at least some level of provenance. If you are a doctor’s office and move your data and applications to a cloud and you have some celebrity patients, how do you ensure that the cloud vendor’s people are not looking at your data and selling stuff to the National Enquirer?
Data encryption, key management and alike are just some of the areas you need to be concerned with.
Availability for your applications and data
You need to clearly define the availability needs you have to any vendor. Having your data available but not the applications – and vice versa – is not good for business.
Availability needs to be clearly defined as not just data and not just applications but whatever your workflow is. Clearly specify a number of 9s and downtimes that you can live with.
This goes back to my above comment on peak usage. If your business is moving to a cloud and the vendor has 1,000 other businesses of the same type with the same peak periods moving, do they have the infrastructure to deal with all of you at your peak?
These are clearly very complex issues and you need to protect your business from a cloud vendor’s lack of engineer rigor from a business perspective.
Choosing a Cloud Storage Vendor: Final Thoughts
I am sure that there are many other areas to think of that are specific to a business area or a cloud vendor’s current offerings. Moving from local storage and applications to a cloud needs to be clearly thought through. Again, specific language must be put in your contract to hold the vendor responsible to meet your needs and requirements.
In some ways, thinking about things locally was significantly easier, but we all learned the issues over years of time. It is going to take years for people to get a good understanding of how to do the same thing for the cloud, both on the cloud vendor side and the user side.
In the mean time: caveat emptor.