For some years now, data storage vendors have been incorporating big data and analytics tools such as Hadoop into their products. However, the performance requirements of these applications have sometimes proven hard to align with a prodigious need for raw capacity. The typical approach has been to provide an architecture that spans several storage products in order to provide both the performance needed as well as the capacity that these workloads require at a reasonable cost.
“This usually results in a lot of complexity — multiple consoles and systems, for example — as well as high operating costs,” said Laz Vekiarides, chief technology officer of ClearSky Data.
But that is changing. A new generation of storage tools seek to add more analytics and deal better with big data. Here are some of the top trends.
1. Beyond Hadoop
The initial wave of big data storage tools focused on Hadoop and MapReduce. They then began to add analytics via the likes of SAS, Splunk and SAP HANA. But now they are heading into new pastures.
“Storage solutions are expanding their big data and analytics feature sets into fields such as video, image, click steam analytics, telemetry, transcoding and other forms of data transition,” said Greg Schulz, an analyst with StorageIO Group.
2. Edge Computing
Vekiarides said that the amount of big data that needs to be retained for analytics is growing fast. Therefore, elastic tools that allow users to scale up or down are essential. Edge computing is required to support analytics when billions of tiny computers are generating data everywhere.
“The analytics data sets and their processing can’t reside solely in the cloud because the latency is too great,” said Vekiarides. “There is a need to store and process in nearby locales.”
3. Storage Remains Essential
There has been some talk in the media about data storage becoming irrelevant — nothing more than a commodity or a utility. But not matter how fancy the analytics or how huge the big data application, they must be underpinned with high-performance and high-capacity storage.
“Today’s analytics-based initiatives can’t exist without robust storage to support them,” said Vekiarides. “As enterprises prioritize machine data analytics and management, they need to ensure IT infrastructure can support such initiatives and produce return on investment.”
Paul Speciale, vice president of product management at Scality, said multi-cloud storage is emerging as a key area for storage and big data. The main public cloud storage providers (AWS, Microsoft and Google) continue to enhance their services with integrated big data capabilities, including search (AWS EMR and Athena, for example). In addition, their compute bursting capabilities offer advantages in many forms of compute-intensive, “on-demand” analytics workloads.
“A class of multi-cloud data management solutions is emerging that provides the ability to store and manage data across these key cloud providers, as well as in on premises private clouds,” said Speciale.
5. Embedded Intelligence
No longer is it enough to offer an add-on for intelligence, analytics or big data. Everything is being integrated.
“Look for storage solutions that embed intelligence for metadata, search and policy-based data workflows,” said Speciale. “This can help support the analytics applications used by data scientists when storing, archiving and most importantly, mining big data for its true value.”
6. IoT and Machine Learning
Internet of Things (IoT) devices are generating millions of small files at an accelerated rate. Machine learning is adding to the problem with its ability to chew through vast amounts of data in short order. Storage platforms now must up their game to store more and process it far faster than ever to be able to cope with the latest IoT and machine data applications.
“IoT data must be stored, protected and analyzed using cloud-scale technology, because users cannot rack and stack on-prem NAS devices and backup software fast enough to keep up,” said John Capello, vice president of product strategy, Nasuni.
7. Zero Tolerance
In an always-on, always -connected digital economy, where instant gratification is an expectation, there is intense pressure to move toward zero recovery point objectives (RPOs) and zero recovery time objective (RTOs) for data recovery, from today’s commonplace 24-hour or longer RPOs and RTOs.
“This includes recovery from the thousands of malware and ransomware attacks that penetrate organizations every day, such as recent attacks on the National Institutes of Health in the UK, Maersk in Denmark, Toyota in Japan and FedEX in the U.S.” said Mike Grandinetti, chief marketing and corporate strategy officer, Reduxio.
8. Hybrid Storage
Mike McNamara, senior manager of product and solution marketing at NetApp Storage, noted that storage infrastructure is constantly evolving to meet scalability demands brought on by big data. As a result, many enterprises, he said, are converting to a hybrid storage model, scaling out to the cloud entirely to utilize capacity or weighing their options to keep data on premise or in a private cloud.
“Modern enterprises have been wary to move to the public cloud for loss of control of their data or concerns over security,” he said. “As data management tools evolve and support cloud migration and integrate strong security measures, enterprises have become more comfortable moving to a public cloud while keeping control of their data.”
9. Vertical Focus
Point tools that sat on top of storage and cloud platforms characterized the early big data and analytics tools. But the latest crop includes several that seek to integrate all these functions into one stack. With that accomplished, the next step is to tailor these tools to certain industries.
“We are beginning to see vertical-focused big data solutions, targeting specific industries like healthcare and manufacturing,” said Capello.
10. Storage Intelligence
There is a lot going on in the analytics, big data and IoT side that harnesses the underlying storage. But you can sometimes forget to add intelligence about storage. McNamara said enterprises are looking for smart management tools that provide integrated analytics within storage. This enables them to conduct resource monitoring and flag when an aspect of the infrastructure is being underutilized, for example.
“From there, enterprises can re-architect their environment to invest in areas where they need the highest performance,” said McNamara.
Photo courtesy of Shutterstock.