Storage tiering, the concept of storing various types of data on media with different capacities – like speed, for instance – is a popular technique in today’s data centers.
DeepStore provides unusual data storage, and it is a perfect illustration of innovative tiered storage at work.
If you’ve never heard of DeepStore, that may be because it operates a vast file archive at the bottom of a salt mine. It has almost limitless capacity, and it provides a cool, dry, stable environment where paper documents can be stored for long periods at low cost.
When it comes to digital data, there’s also plenty of innovation around tiered storage going on right now. That’s because the amount of different ways of storing data is exploding.
In the past, tiered storage was all about moving data between hard disk drives (HDDs) and tape – with different types of HDDs (15k disks, 7.2k disks, SAS drives, SATA drives), different configurations of HDDs (short stroking, data striping and so on) and different implementations of tape (tape libraries, offline archives) providing different tiered storage layers with different cost, performance and capacity characteristics.
New Tiered Storage Layers
But that model based around HDDs and tape has been blown out of the water due to the emergence of a whole new top layer of tiered storage based on solid state drives (SSDs). (Flash can also be used in server and storage system caches too.)
And further down the pile, the cloud has also emerged as the home to one or more new tiered storage layers thanks to the availability of low cost cloud storage resources accessed through cloud gateways, and through ultra-low-cost cloud-based offline archiving services such as Amazon’s Glacier or Google Cloud Storage Nearline.
So rather than the traditional four or five layers of tiered storage, (Tier 0 or 1 thru 4) what’s now emerging is a tiered storage model with a far higher number of tiers. Each one is subtly nuanced to produce a different combination of three key storage attributes: cost, performance and capacity.
Tiered Storage for Bottom Line Benefits
That’s significant for businesses because ultimately tiered storage is not driven by the needs of the IT department: it’s an artifact that owes its existence to accountants, due to the contradictory needs of data users and the business as a whole.
For IT departments and the consumers of IT services, the ideal situation would be to store all data at Tier 1 (or Tier 0, if there is one) for the highest possible performance, regardless of cost.
But for a business that aims to maximize profits, or an IT department with a finite storage budget, cost-efficient deployment and use of storage is important.
Given that constraint, the optimum solution is to store data in the lowest-cost layer of tiered storage that provides at least the minimum storage performance (and capacity) required.
In theory then, it follows then that the more storage tiers are available, the more precisely minimum data performance (and capacity) requirements can be met. Less data will be housed in an expensive high tier because the cheaper tier below doesn’t offer sufficient performance or capacity.
That’s the theory, anyway, but Tony Lock, an analyst at Freeform Dynamics, points out that in practice there’s a limit to how many tiers it’s useful to have. “Having more tiering options is great as long as there are enough different workloads to make use of those options. Could there be too many tiers? Maybe.”
That’s because choosing which data to put into a wide choice of tiers may ultimately be the sticking point. “If data classification becomes too complex then you probably will end up not doing it at all,” he says.
Software-Defined and Tiered Storage
The concept of tiered storage is not new: manual approaches to tiered storage have been used by storage administrators for years. And automated systems that move data between different storage systems, between different drive types or RAID groups within a single system, or on different storage media on a hybrid storage system are a more recent development that dates back to the 1980s or even earlier. Many storage vendors (and even operating system makers) now offer automated tiering software.
With the many new storage tiers that have emerged, and the price, performance and capacity attributes of SSDs and HDDs changing rapidly as new technology (such as shingled magnetic recording (SMR), 3D NAND and 3D XPoint) is developed, software-defined storage is emerging as a key enabling technology multi-layered tiered storage.
That’s the view of Mark Lewis, a former CTO of EMC and GM of HP Storage and now CEO of storage software startup Formation Data Systems. That’s because software-defined storage allows any type of storage, including emerging storage media types like 3D XPoint, to be attached to commodity hardware, and tiered storage functionality is provided by the SDS software itself.
“If you take the example of 3D XPoint, software-defined storage changes the game,” Lewis says. “Up to now, every time a new tier of storage opened up, you had to deal with a whole new range of vendors selling incompatible hardware. But software-defined storage insulates you from new architectures and products, so there’s no need to change arrays and learn how to use new storage software.”
That makes it much easier to create additional storage tiers to match your data’s requirements, without the costs associated with purchasing and managing new storage systems cancelling out any benefits.
But Freeform Dynamics’ Tony Lock warns that although software-defined storage is a good idea, it may not be suitable for small and medium-sized organizations. “For many organizations, anything that requires implementation is tricky – they want to buy storage that is supported by a vendor.”
Tiered Storage with Data Virtualization
Mark Lewis points out that many software-defined storage implementations in the past have tended to focus on the lower layers, Tiers 3 and 4, where low-use data and archive data is stored. Here a key enabling technology is data virtualization, Lewis believes. “With data virtualization, an application doesn’t know what policies you set about storage tiering. If you need data, you just get it,” he says.
“With virtualized data – block, file or object – we create an index for it so we have a data object model. All these objects are structured, and the index is structured. So then we can move data from flash to disk and then on to the cloud without changing the index needed to access the data. So as far as the application is concerned, it thinks the data is in the same place.”
Data Classes for Tiered Storage Layers
In order to benefit from multi-layered tiered storage, it is necessary to classify your data into multiple classes. Today most organizations use just four classes for tiered storage:
- Mission critical. This class of data that always needs to be stored in the highest level of tiered storage (Tier 1) because it is needed to support high-speed applications – perhaps supporting customer transactions. Delays in accessing data will cause the organization to lose business or otherwise negatively impact profitability. Performance is all-important.
- Hot data. This class of data needs a relatively high level of tiered storage (Tier 2) because it is in constant use in applications such as CRM, ERP or even email, and needed for the day-to-day running of a business. Performance is important at this layer of tiered storage, but cost is also a consideration.
- Warm data. This class includes older data such as emails that are more than a few days old or data on completed transactions. This type of data will be accessed relatively infrequently but still needs to be readily accessible when required. The most important consideration at this layer of tiered storage, Tier 3, is cost, but subject to a minimum performance threshold.
- Cold data. This class of data may ever be accessed again, but it needs to be archived and retained to comply with regulatory or other legal requirements, or simply because it may have some value at some unspecified time in the future. Cold data is ideally suited to the lowest layer of tiered storage, Tier 4, where access times of minutes or hours are acceptable, and low cost is the overriding consideration.
Photo courtesy of Shutterstock.