Public cloud vendors offer several low-cost options for storing data that is accessed very infrequently.
Efficient storage management includes migrating aging data through progressively less-expensive storage tiers. When data ends its migration at the cold storage stage, you can keep it for long periods of time at very low cost.
Cloud-based data storage generally falls into these four storage classes or tiers:
The biggest single reason for using cold storage is saving money by reducing use of hot, warm and cool storage tiers. Cold storage provides efficient and infinitely scalable capacity at a lower cost than any other storage tier.
For example, the healthcare industry produces massive amounts of medical images with retention requirements in the decades. The financial industry also has steep retention requirements, in some cases up to 30 years. Many financial institutions have stored this data in tape vaults for many years, but restoring massive data sets from tape is expensive. Cold storage in the cloud retains data for long periods, and restoring the data does not require original tape drives.
Litigation and regulatory investigations are also cold storage usage cases. For example, a retail chain might store massive amounts of backup on the cloud. One day the company receives a lawsuit from a customer who slipped and fell in a store seven months ago. The business will need to search through their backup for relevant data, collect it, analyze it and provide it to the reviewers within a few weeks. This is far simpler to do on cold storage in the cloud than from massive tape collections.
A third scenario is preserving raw data for analytics and secondary applications. Massive data sets are very expensive to keep on hot or warm storage systems. Cold storage tiers keep the raw data available for occasional access at a very low cost.
For many companies, cold storage in the cloud offers distinct advantages over on-premise nearline storage or tape vaulting. The public clouds are ramping up their cold storage in response. Amazon Glacier and the new Google Cloud Storage Coldline are dedicated to long-term cold storage. Azure uses its Cool Blob Storage to serve both cool and cold tiers.
The three services have a lot in common. Storage pricing is very similar. Amazon and Google both charge .007 cents per monthly stored gigabyte. Azure charges by geographical regions, with price points ranging between $0.01 per gigabyte and $0.024 per gigabyte for cool and hot blobs. (Cool blobs are priced at the lower end of the scale.) Data access and recovery are more expensive than simple storage, which protects the public clouds against customers using cold storage as a cheap active data tier.
Durability is critical for all three services. Both Glacier and Coldline clock their durability in 11 nines (99.999999999 percent). Both services achieve this availability level by redundantly storing data across multiple domains, storage systems, and disks. As for durability, Azure goes beyond 11 nines by guaranteeing 0 percent data loss for both hot and cool storage blobs.
Recovery service levels differ somewhat between the three. For example, Amazon Glacier offers different service levels for restore times that range from minutes to hours while Google Coldline and Azure Cool Blob Storage offer fast recovery in milliseconds. Not everyone needs to recover cold data storage in such a short amount of time, but if you do — such as quickly accessing a backup data set — then the much shorter access time could prove very handy.
Data transfer times are important to uploading data as well as retrieving it. Whether you backup first to the cloud or keep backup copies on-site and then back them up, you need cloud transfers to stay within backup windows. The most efficient way to do this is to choose a backup product that backs up incremental changes and rehydrates them into a full restore. Also, look for backup providers who can accelerate cloud transfers between the on-premise data center and cold storage tiers.
All backup products backup to the cloud as a target, but not all of them optimize backup to cold storage tiers. Typical features for this level of integration include policy-based backup and archiving to the cold storage tier, indexing the tier for faster search and recovery, and offering flexible site choices when recovering data.
Cloud backup vendor CloudBerry Lab supports multiple clouds, including Amazon, Google, and Azure, for cross-platform backup. CloudBerry backs up to cold and cool storage classes as well as hot and warm tiers, and it was one of the earliest backup vendors to support Google Coldline. Image-based backup and strong encryption round out the portfolio.
Cohesity hyperconverges massive secondary storage on-premise, remotely and in the cloud. Cohesity also supports all storage classes and directly backs up to Amazon Glacier, Google Coldline and Azure Cool Blobs.
Commvault backs up to multiple clouds including all three hyperscale public clouds. Its Simpana software can back up directly to Azure Hot and Cool Blobs, and integrates with Amazon S3 and Glacier. Users can adjust some cloud settings using the CommCell Console.
Stored data is growing at a terrific pace, and businesses need to retain much of it for compliance, analytics, and research purposes. Keeping all this data on costly storage tiers is extremely expensive, both in capital and operating costs.
Up until now tape has been the solution to cold storage requirements. But massive data volumes and the need to quickly access data for recovery or analytics have outstripped tape’s effectiveness.
This is why the cloud is deservedly popular for storing cold data — and why public cloud vendors have stepped up with cold storage services. IT must still perform due diligence to investigate which cold storage tiers are optimal for their needs and which backup vendors optimize cloud-based cold storage tiers. Although this research will take some time and energy, the cost and durability benefits of cloud-based cold storage are more than worth it.
Photo courtesy of Shutterstock.