With data collection and usage on the rise, the storage market is looking to offer solutions that involve data compression or storing data in a format that requires less space than usual.
According to Barracuda, “Data compression is the process of encoding, restructuring, or otherwise modifying data in order to reduce its size. Fundamentally, it involves re-encoding information using fewer bits than the original representation.”
Compression algorithms alter files in various ways, such as eliminating the many spaces that appear in documents, crushing bits closer together, removing unnecessary characters, and other methods. For text files, this can lower the size by more than 50%.
ZIP files are a common form of compression that is in everyday use. These files reduce the time it takes to relay a file using compression.
The advantages of compression include the need to buy less storage hardware, lower data transmission times, and lower consumption of bandwidth.
Here are five of the top trends in data compression:
1. Lossless vs Lossy
Lossless compression is about removing bits without actually eliminating information. It does this by finding and removing various statistical redundancies. For databases and other applications where high quality is required, this is a good approach. The downside is that compression ratios are lower. For example, the PNG format is one of the options available for lossless compression.
Lossy, on the other hand, compresses by deleting unnecessary information and reducing complexity as a way to greatly increase the compression ratio. The downside is the possibility of file quality degradation. JPEG is a good example of a file format that utilizes lossy compression.
2. Tape Compression
Tape vendors have become particularly astute at compression. With the resurgence of tape as the go-to destination for large archives, tape shipments have risen sharply — as has the need to pack more data into much smaller spaces.
Accordingly, the popular LTO format has achieved compression ratios of 2.5:1. The FUJIFILM LTO Ultrium 9 Data Cartridge offers up to 45 TB of storage capacity (18TB for non-compressed data).
“This next generation of higher capacity and faster tape storage media represents a significant step towards reducing costs, lowering energy consumption and CO2 emissions, and leveraging tape’s inherent security benefits,” said Hironobu Taketomi, president at FUJIFILM Recording Media U.S.A.
3. Hardware Table Stakes
Compression used to be a value-added feature. Or something to differentiate one storage hardware vendor from another. These days, good compression has become table stakes, used by just about all storage hardware vendors.
Dell EMC PowerScale, for example, is a scale-out network attached storage (NAS) storage system that includes the OneFS operating system to deliver a multi-protocol namespace to run any file or object or analytics-based application. Its automation features mean a single admin can manage PB of storage, conduct parallel upgrades, and take care of functions, such as in-line compression and data deduplication.
“Another trend is how deduplication and compression have settled into their role as a feature function as opposed to being a separate product,” said Greg Schulz, an analyst with StorageIO Group.
“They co-exist along with other technologies.”
4. Data Footprint Reduction
The storage market is fond of inventing new terms. Sometimes these represent bold new features. At other times, it is a new dressing put on a well-known feature. Data footprint reduction (DFR) probably falls somewhere in the middle, as it encompasses both compression and deduplication.
“Dedupe and compression are data footprint reduction technologies that reduce the size and impact of your data as well as associated storage costs along with management,” said Schulz with StorageIO Group.
“Since not all data and applications are the same, there are different DFR techniques and technologies to reduce data footprint impact, some data may dedupe, some may compress. One of the newer trends is that different DFR techniques, such as compression and dedupe, are used in combination as well as in different locations with different granularity.”
Depending on the application, it may be better to only compress data and not deduplicate or vice versa. Or do one first and then the other. DFR attempts to capture this entire subject under one label by finding the best place for deduplication and compression to be done (at the source on a server in a database, file system, operating system, on storage device, or elsewhere) and in what sequence.
“You may use all the different DFR techniques available, from server side to file systems to archive, dedupe, and compression at various locations, including to reduce data impact over networks,” Schulz said.
5. Use in Audio and Video
With streaming networks, ubiquitous YouTube videos, and the rise of podcasts, there is an increasing need for compression of audio and video files. These technologies have gotten more sophisticated in recent years.
Audio compression is implemented as audio codecs that help to compress files to address bandwidth and storage constraints.
Videos, of course, typically combine image and audio compression with different codecs for each. Video files tend to use lossy compression as in formats such as MPEG.