Hyperconverged systems fully integrate compute, networking, server virtualization, storage, data deduplication/compression and storage virtualization across entire distributed systems. Some hyperconverged products also converge data protection, yet hyperconvergence remains primarily a data center and production environment play. It’s an increasingly popular one. In a recent Taneja Group survey, more than 25 percent of enterprise IT directors reported that they are adopting hyperconvergence as a primary architecture in the data center.
Hyperconverged platforms have led to a tremendous change in the way that the data center operates. Instead of traditional SAN storage and separate networking and servers, storage is virtualized and clustered with the compute stack. The hyperconverged platform abstracts not only applications and data from physical components, it also adds virtualized and pooled storage into one heady mix.
The benefits are big. IT can reduce both Opex and Capex, free up staff time to optimize virtual applications and spend less time managing storage.
However, although hyperconvergence has a big and growing effect on the production environment it has less of an effect on secondary storage. This is not a criticism. Primary hyperconvergence vendors like Nutanix and SimpliVity do offer native data protection, which may include native backup or backup integration, replication, VM cloning and failover. However, primary hyperconvergence exists to serve the production environment, not to converge the 80-85 percent worth of enterprise data that resides on secondary storage.
Therefore, it is time to bring the concept of hyperconvergence to secondary storage.
Hyperconvergence Characteristics
First let’s look at the common factors that allows us to call a platform hyperconverged. They are the same between primary and secondary: scale out and web scale distributed file systems.
- Massively scale-out. Scale-out is a vital part of primary hyperconvergence, and is arguably even more critical to hyperconverging secondary storage given large-scale data growth. Hyperconverged secondary storage is built on clustered storage nodes that IT can grow at will for additional capacity and performance.
- Web-scale distributed file system. A distributed file system is a big step beyond a common management interface for multiple storage systems. Hyperconverged systems need the capability to index and present all stored data at all times. This requires a file system that is automatically distributed across all nodes. And since these scale-out systems can grow to thousands of nodes, the file system needs to be capable of web-scale performance and high reliability.
With the above architecture in place, hyperconverged secondary storage creates a cost-effective secondary storage tier that presents data to test/dev, data protection, analytics, compliance and eDiscovery, and so on – all on a single system. Benefits include a single system to manage, an end to secondary storage silos, fewer copies and system-wide improvements to capacity management and performance. Hyperconverged secondary storage also renders previously opaque data transparent to IT and processes.
Usage Cases
- More effective business continuity. Realistically you will want to replicate some data from your data center to a remote or cloud site, such as Tier 1 data or heavily regulated data with off-site copy requirements. However, you do not need a remote (and expensive) replica of the production stack if you can easily recover using hyperconverged secondary storage. Note that most hyperconvergence vendors in both the primary and secondary storage spaces offer native replication.
- Better test/dev. Software developers work with production replicas to test new releases and build sandbox environments. Traditionally this requires dynamic storage for migrating copies of the production environment. But with hyperconverged secondary storage, the data is virtualized and takes up zero space. Developers can spin up and retire VMs at will, with no need to manage the test/dev environment for dynamic capacity.
- Insightful analytics. It’s challenging enough to run analytics on primary data, let alone running analytics on secondary data located in multiple storage silos. With hyperconverged secondary storage, IT can run analytics across more data and do so at lower cost. This enables IT to understand usage trends, log operations by VM’s and by user, locate date ranges and custodians for eDiscovery, and render data across the system more transparent.
Cohesity and Rubrik stand out in the new hyperconverged secondary storage space. Both of them have cloud integration, which is vital given the unending growth of secondary data.
Rubrik’s Converged Data Management Platform includes backup software for efficient ingestion, and it deduplicates and indexes storage across the entire massively distributed system. Rubrik also supports test/dev operations.
Cohesity also deduplicates and indexes all data within its scale-out system, and it provides native support for data protection, test/dev, and analytics including utilization trending and user/VM reporting. In addition, Cohesity supports user and third-party created custom analytics.
Outlook
Primary hyperconvergence is transforming the data center by enabling IT to manage production data holistically for efficiency and low cost. These same benefits are translating to secondary storage. By building unified and massively distributed secondary storage, IT can better identify and use this data for storage management and business processes.
Although Cohesity and Rubrik have both hit the market with a bang, new technology adoption is always slower than most vendors would like it to be. In this case, hyperconverged secondary storage greatly benefits from growing familiarity with primary hyperconvergence. With the latter, more and more data centers admins will adopt hyperconverged systems as their traditional infrastructure components reach end-of-life. Hyperconverged secondary storage will see a similar process over time. Initial adoption will be from companies who spend significant money and time to manage secondary storage silos, and to bring some sense out of a piecemeal storage landscape. They will look to hyperconverged secondary storage with relief. They will adopt solutions as a pilot project or limited release, and as they see success, they can easily grow their highly scalable system. Increasingly larger adoption waves will follow.
At the same time, existing hyperconverged vendors will continue to finetune their offerings with features such as better integration with the public cloud. New hyperconverged secondary storage products will muddy the waters somewhat, as will more sophisticated data protection offerings from the primary hyperconvergence vendors. The market will also hear more about edge convergence, as vendors seek to extend hyperconvergence to remote offices and back offices (ROBO). But on the whole, both the primary and secondary hyperconvergence markets will grow for a long time to come.
Photo courtesy of Shutterstock.