Cloud-based data storage generally falls into four storage classes or tiers:
- Hot storage is primary storage for frequently accessed production data.
- Warm storage stores slightly aging but still active data. It costs less because the underlying storage systems don’t have the high performance and availability requirements, but it keeps data quickly accessible.
- Cool storage houses nearline data, which is less frequently accessed data that needs to stay accessible without a restore process.
- Cold storage is a backup and archival tier that stores data very cheaply for long periods of time. Restore expectations are few and far between. Security, durability, and low cost characterize this tier.
Cold Storage Use Cases
The biggest reason for using cold storage is saving money by reducing use of hot, warm, and cool storage tiers. Cold storage provides efficient and infinitely scalable capacity at a lower cost than any other storage tier.
For example, the healthcare industry produces massive amounts of medical images with retention requirements in the decades. The financial industry also has steep retention requirements, in some cases up to 30 years. Many financial institutions have stored this data in tape vaults for many years, but restoring massive data sets from tape is expensive. Cold storage in the cloud retains data for long periods, and restoring the data does not require original tape drives.
Litigation and regulatory investigations are also cold storage usage cases. For example, a retail chain might store massive amounts of backup on the cloud. One day the company receives a lawsuit from a customer who slipped and fell in a store seven months ago. The business will need to search through their backup for relevant data, collect it, analyze it and provide it to the reviewers within a few weeks. This is far simpler to do on cold storage in the cloud than from massive tape collections.
A third scenario is preserving raw data for analytics and secondary applications. Massive data sets are very expensive to keep on hot or warm storage systems. Cold storage tiers keep the raw data available for occasional access at a very low cost.
In healthcare, finance, and law, cold data storage can help companies comply with data retention regulations. Health record retention varies among states, but typically the law requires at least a few years after a patient’s discharge or death. If a patient has been released permanently from the hospital but their healthcare records still must be retained for years, cold data storage is the least expensive option. It’s designed for infrequently accessed data.
For healthcare (or financial or legal) data that probably won’t be accessed more than once or twice a year, deep archive storage (such as Amazon Glacier Deep Archive) is the least expensive option. Glacier Deep Archive costs $0.00099 GB/month.
Cold Storage and the Public Cloud
For many companies, cold storage in the cloud offers distinct advantages over on-premise nearline storage or tape vaulting. The public clouds are ramping up their cold storage in response. Amazon Glacier and the new Google Cloud Storage Coldline are dedicated to long-term cold storage. Azure uses its Cool Blob Storage to serve both cool and cold tiers.
The three services have a lot in common. Storage pricing is very similar. Amazon and Google both charge $0.004 per monthly stored gigabyte. Azure charges $0.01 per gigabyte for its cool blob storage for objects. Data access and recovery are more expensive than simple storage, which protects the public clouds against customers using cold storage as a cheap active data tier.
Durability is critical for all three services. Both Glacier and Coldline clock their durability in 11 nines (99.999999999 percent). Both services achieve this availability level by redundantly storing data across multiple domains, storage systems, and disks. Azure logs 11 nines for locally redundant storage and 12 nines for zone-redundant storage.
Recovery service levels differ somewhat between the three. For example, Amazon Glacier offers different service levels for restore times that range from minutes to hours while Google Coldline and Azure Cool Blob Storage offer fast recovery in milliseconds. Not everyone needs to recover cold data storage in such a short amount of time, but if you do, then the much shorter access time could prove very handy.
Data transfer times are important to uploading data as well as retrieving it. Whether you back up first to the cloud or keep backup copies on-site and then back them up, you need cloud transfers to stay within backup windows. The most efficient way to do this is to choose a backup product that backs up incremental changes and rehydrates them into a full restore. Also, look for backup providers who can accelerate cloud transfers between the on-premise data center and cold storage tiers.
- Amazon Glacier is offered by Amazon Web Services (AWS) and offers unlimited cold data storage. Amazon isn’t kidding about Glacier being a cold storage tier. Retrieval costs are higher, and data access and recovery can take five hours or more. The two-step retrieval process first retrieves data from the staging area, then offers a 24-hour window to either download it or access it via Amazon EC2. Glacier offers a bulk retrieval option that enables businesses to retrieve TB and PB of data, typically within 5 to 12 hours. This option enables customers to cost-effectively use Glacier to store big data for occasional analysis.
- Google designed Google Cloud Storage Coldline as a direct competitor to Glacier and charges the same amount for monthly storage. Both Amazon and Google discourage frequent data retrieval from their cold storage services, but when customers need to retrieve data fast, Coldline is there with retrieval periods in the milliseconds. Google markets these fast recovery times for the disaster recovery market, where customers may need to download high volumes of data very quickly.
- Microsoft Azure Cool Blob Storage is more like Google Nearline and Amazon S3 Standard I/A than Coldline and Glacier, but it can serve as cold object-based storage. Both hot and cool Azure blobs store unstructured data as objects. As with the other two cold storage services, cool data is much less expensive to store than hot tiers. Data retrieval is in the milliseconds and access costs are higher than storing cold data.
Cold Cloud Backup Vendors
All backup products back data up to the cloud as a target, but not all of them optimize backup to cold storage tiers. Typical features for this level of integration include policy-based backup and archiving to the cold storage tier, indexing the tier for faster search and recovery, and offering flexible site choices when recovering data.
- Cloud backup vendor MSP360 offers CloudBerry Backup, a service that backs up data to cold and cool storage classes as well as hot and warm tiers. CloudBerry Backup is available for AWS, Microsoft Azure, and Google Cloud; it was one of the earliest backup vendors to support Google Coldline. CloudBerry Backup is best suited for small businesses.
- Cohesity combines massive secondary storage on-premise, remotely, and in the cloud. Cohesity also supports all storage classes and directly backs up to Amazon Glacier, Google Coldline, and Azure Cool Blobs.
- Commvault backs up to multiple clouds, including Amazon, Google, and Azure. It also supports Nutanix, IBM, VMWare, and Alibaba. Users can adjust some backup settings using the CommCell Console.
- Veritas NetBackup offers flexibility to choose from a variety of clouds and workloads. It also provides data migration, disaster recovery, and virtual machine management.
The Benefits of Cold Cloud Storage
The need for stored data continues to increase, and businesses must retain much of it for compliance, analytics, and research purposes. Keeping all this data on costly storage tiers is extremely expensive, both in capital and operating costs.
In the past, tape was the solution to cold storage requirements. But massive data volumes and the need to quickly access data for recovery or analytics have outstripped tape’s effectiveness.
This is why the cloud is deservingly popular for storing cold data — and why public cloud vendors have stepped up with cold storage services. IT must still perform due diligence to investigate which cold storage tiers are optimal for their needs and which backup vendors optimize cloud-based cold storage tiers. Although this research will take some time and energy, the cost and durability benefits of cloud-based cold storage are more than worth it.