Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
File sharing company Box put the spotlight on a massive problem for global enterprises that want to use cloud storage providers when it recently introduced geographic storage zones for its service.
The problem, put simply, is that data protection laws and regulatory requirements in many industries such as insurance and healthcare - particularly in the EU and Asia - mean that certain types of data must be stored within specific regions or national boundaries.
While that may not be a problem for US-based companied with US-based customers and suppliers, it is a major problem for global enterprises that are considering cloud storage and that have customers and suppliers all around the world.
"Financial services firms in Germany face strict regulations around where their files are stored, leaving them with a limited set of options, and keeping many of these enterprises stuck on legacy infrastructure," explains Aaron Levie, Box's CEO, as an example.
It's also a problem for cloud storage companies like Box who may wish to expand into European and Asian markets.
Cloud storage zones
Box's solution was to introduce what it calls Box Zones, a technology that provides "in-region" cloud data storage for its services. Instead of building its own cloud storage facilities around the world, Box has made deals with cloud storage giants Amazon and IBM so that it can use their data centers around the world, allowing Box to offer a service which allows customers to specify that their data is stored in Germany or Ireland in the EU, and Singapore or Japan in Asia.
The fact that Box does not operate its own data centers in Europe or Asia is significant because it highlights the fact that only the very largest companies can afford to operate multiple data center facilities.
And that's got important implications if you are planning on using cloud storage services and your operations are international: it means that one of your selection criteria for a cloud storage provider should be the quantity and geographic spread of its data centers.
Latency and redundancy affects cloud storage
Let's take a closer look at why quantity and spread are important.
A good geographic spread is important to accommodate your data that needs to reside in specific regions (like the EU) or specific countries (like Germany), as we've already discussed.
But there's also the issue of keeping data close to your end users in order to reduce latency. For some applications this might not be an issue, or it can be mitigated by using a content delivery network like Akamai, CacheFly or CloudFlare. But for other applications you may only be able to achieve a low-enough latency connection by using cloud storage hosted in a data center close by.
The number of data centers in a region is important for reasons of redundancy. That's because it's not enough to have your data replicated onto multiple servers within the same data center if some natural disaster strikes and knocks out an entire data center - as was the case in 2012 in New York after Hurricane Sandy.
To protect against that type of eventuality your cloud storage provider needs to have an alternative data center that's far enough away to be unaffected by the same disaster, yet it needs to be within the same region (or even country) to ensure that your data isn't moved out of the area in which it is required to be stored for compliance reasons, and that latency remains within acceptable bounds.
And that means that cloud storage providers need plenty of data centers - ideally at least two in each country or region that you want your data to reside in.
Cloud storage providers are not all equal
So how do the major cloud storage providers stack up when it comes to quantity and spread?
Box appears to have chosen to offer its new service from data centers operated by Amazon and IBM because they are the two biggest when it comes to the quantity and spread of data centers. And of these two, Amazon is the leader.
Amazon's AWS cloud infrastructure is built around Regions and Availability Zones. (A Region is a physical location in the world where there are multiple Availability Zones, and Availability Zones consist of one or more discrete data centers, each with redundant power, networking and connectivity, housed in separate facilities.)
In total Amazon has 33 Availability Zones (with one or more data centers) in 12 geographic regions: N. California, Oregon, North Virginia, Sao Paulo, Ireland, Frankfurt, Singapore, Beijing, Seoul, Tokyo, Singapore and Australia. New ones are slated to open soon in Ohio, Montreal, UK, India and Ningxia (China).
IBM's SoftLayer data centers are almost as numerous as Amazon's. In total it offers cloud storage services from 28 data centers, and they are well dispersed geographically with less than half situated within the United States.
The 14 North America based SoftLayer data centers are located in Dallas (6), Houston, San Jose (2), Washington, D.C. (2) Toronto, Seattle and Queretaro, Mexico.
6 others are situated in Europe (Amsterdam (2), Frankfurt, London, Milan , Paris), 5 in Asia-Pacific (Chennai, India, Hong Kong, China, Melbourne and Sydney, Australia, and Tokyo, Japan). SoftLayer also has a single data center in Sao Paulo, Brazil.
But Amazon and IBM are not the only cloud storage services providers in town. Other credible alternatives are offered by the likes of Microsoft (though its Azure cloud) and Google.
Microsoft's Azure Storage cloud infrastructure is only slightly smaller than Amazon's and IBM's and it is growing rapidly: Microsoft Azure cloud storage services are offered from 20 regional data centers (excluding US Government data centers), and a further 6 (excluding DoD data centers) have been announced.
These are located in Central US (Iowa), East US (Virginia), East US 2 (Virginia), North Central US (Illinois), South Central US (Texas), West US (California), North Europe (Ireland), West Europe (Netherlands), East Asia (Hong Kong), Southeast Asia (Singapore), Japan East (Tokyo, Saitama), Japan West (Osaka), Brazil South (Sao Paulo State), Australia East (New South Wales), Australia Southeast (Victoria), Central India (Pune), South India (Chennai), West India (Mumbai), China East, (Shanghai), and China North(Beijing)
Newly announced Azure regions that will come online soon include Canada Central (Toronto), Canada East (Quebec City), Germany Central (Frankfurt), Germany Northeast (Magdeburg), UK South (TBA), UK West (TBA)
Although Google operates many data centers around the world, its Google Cloud is much smaller than these previous clouds, and Google Cloud Storage services are offered from just 3 (large) regions: North America, Europe and Asia Pacific. In total it has 6 data centers in North America (Council Bluffs, IA, Berkeley County, SC, Atlanta, GA, Lenoir, NC, Mayes County, and OK, Dalles, OR), and just 1 in Europe (St Ghislain, Belgium,) and 1 in Asia Pacific (Changhua County, Taiwan.)
One more cloud storage service provider that's worth mentioning is Rackspace. That's because the company is marginally ahead of Google in terms of data center numbers - it operates 10 data centers in six regions: Dallas, Chicago, North Virginia, London, Hong Kong and Sydney.
Source: corporate websites. Note: these numbers are snapshot in time: all cloud storage providers are likely to increase the number of data centers they operate over the coming months.
Cloud storage provider summary
What all this shows is that while cloud storage providers may offer storage capacity at very similar prices, it shouldn't be treated as a commodity. Depending on the location of your customers, the regulatory frameworks you need to adhere to and the nature of the data that you want to store, cloud storage providers with data centers in the right areas may be far more attractive propositions than others.