Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Recently, we’ve covered the facts that storage volumes are set to reach almost obscene levels over the next few years; how disk, flash and tape are evolving to cope; and that you don’t necessarily have to store everything when the Internet of Things (IoT) revolution takes hold.
Regardless of all that, the industry needs many other technologies to deal with the data onslaught. These includes deduplication, compression, object storage and ways to boost IO performance.
Deduplication and Compression
Deduplication and compression have been with us for some time. Just about any backup tool now comes with them. These technologies are going to continue to have high importance in the years ahead.
“To the extent that IoT produces repeated patterns of data across a mass number of devices, the elimination of redundant data will make it possible to store data on a reasonable footprint,” said Avinash Lakshman, CEO of Hedvig. “Techniques for data efficiency such as deduplication and compression will be critical in effectively storing massive amounts of data.”
Object storage is another area that opens the door to dealing better with the scale-out architectures necessary for our storage future. It is particularly good for storing unstructured data as it separates the metadata from the data. As a result, the storage system isn’t dependent upon the underlying file system or block storage structure. That means administrators don’t have to deal with setting RAID levels or building and managing logical volumes. Object storage, then, is looking more and more like a necessary element of any storage infrastructure that hopes to achieve massive scale. Once you get beyond a particular point, its cost effectiveness becomes apparent, which is why Facebook uses it to store the hundreds of billions of photos it receives at a rate of half a billion per day.https://o1.qnsr.com/log/p.gif?;n=203;c=204650394;s=9477;x=7936;f=201801171506010;u=j;z=TIMESTAMP;a=20392931;e=i
“Object storage designed to deal with large volumes of data in distributed environments will be key to any system intended to store massive quantities of IoT data,” said Lakshman.
Barbara Murphy, vice president of marketing, cloud infrastructure business unit at HGST, a Western Digital Company, agrees. She stated that the sea of change with cloud compute and storage has been maturing for a number of years and a scalable data management platform is now taking root.
“The traditional paradigm of block and file, SAN and NAS storage systems will be complemented — and in many cases replaced — with robust scale-out infrastructure led today by object storage solutions with erasure coding techniques,” said Murphy.
HGST’s Active Archive Platform, for instance, is an object storage system with cloud-based archiving. Densities can exceed 10 PB.
But the ruination of many modern high-performance storage systems is that while data can be processed by multi-core processors at light speed, the data transport mechanisms lag behind. This results in bottlenecks. It’s similar to buying a fleet of Corvettes to get your company members around the city faster and then finding that the freeways are always in gridlock.
“Aside from the sheer capacity challenges and management issues that storage is faced with due to data creation, one of the most critical issues is how to handle increasing I/O demands,” said George Teixeira, president and CEO, DataCore Software. “As processors and memory have dramatically increased in capability, the I/O continues to be the straw that bottlenecks overall performance — especially when it comes to the core business applications driving databases and on-line transaction workloads.”
He thinks that better software is the key to solving this dilemma in order to utilize multicore/parallel processing infrastructures fully. Accordingly, the company has been working hard on the software-defined storage and parallel I/O software technology within SANsymphony to improve overall I/O and therefore application and storage performance.
Gridlock has been a part of the urban landscape since the early days of the automobile. Big cities struggle to cope with it and have initiated a variety of schemes. One popular approach is to improve public transport and advocate that people leave their cars at home and take the bus or train. London went as far as to impose a congestion charge for the city and to block off half the traffic lanes as bus-only routes. This makes it a nightmare for anyone bold enough to take a car into the center of town. But it certainly went a long way to speeding up the time it took to get from A to B by bus.
Similarly, in storage faster buses and data transport lanes are part of the solution. A new technology known as Non-Volatile Memory Express (NVMe) has come a long way by increasing the speed of data transport between (and within) systems, removing lengthy queues and eliminating bottlenecks.
“NVMe has applicability to trends like mobility, IoT and streaming video in order to open the door for cheaper, faster, bigger storage in the future,” said J Metz, R&D engineer for storage at Cisco Systems. “One of the greatest advantages of the work being done with NVMe is the improvement in efficiency for addressing flash and other non-volatile memory. Combine that with dramatic power reduction and you have a recipe for faster, less power-hungry mobile devices.”
He said that for such use cases, there will be a natural performance boost by using NVMe instead of traditional SAS or SATA connections. But even that, he added, is still only one small piece of the overall puzzle. How applications are designed to take advantage of NVMe will have a large role. That is, will applications be written to take advantage of NVMe's multi-queued capabilities? If so, they will perform better. Additionally, network protocols will also influence overall performance, which in turn affects IoT and other areas that require large-scale data storage.
“NVMe is a faster, more power-efficient way of addressing non-volatile memory, but that doesn't mean that it will be a panacea for all other aspects of data distribution,” said Metz.
Photo courtesy of Shutterstock.