Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
In general, people tend to keep all data. If you have been a system administrator you know the constant stream of user requests for more space that result from this behavior. To help the situation, we need to revisit a time when we didn't have much storage – and start compressing the data back then.
At the expense of revealing my age, when I first started working on computer systems, storage space was at a premium. This was the very beginning of the post-punch card era (I actually learned programming on punch cards), and magnetic storage (hard drives) were rare and had a very limited capacity.
To save space we would do everything possible in our code to save memory and we would keep what we stored to a bare minimum. For example, we would link our object files immediately and then erase the object files before running the executable. Then we would erase the executable after our run. We would look at the output and if it was useful, we would make a print out. If the application output file(s) were useful we made a printout and kept the file, otherwise we erased them. We would compress every file if they weren't being used, such as input files, source code, output files, makefiles, etc. It was the storage capacity jihad but it was very effective.https://o1.qnsr.com/log/p.gif?;n=203;c=204650394;s=9477;x=7936;f=201801171506010;u=j;z=TIMESTAMP;a=20392931;e=i
These habits die hard and they came in handy when I moved from minicomputers to microcomputers (PC's) where we had very small hard drives and floppy drives. Initially during the PC era, I compressed all files where I could. Storage was not cheap but I won't tell stories about my first giant 20MB hard drive or the boxes of cheap, mail order floppies I used instead of hard drives. Even today with very large 4TB drives readily available, I still find the urge to save as much storage space as possible by erasing files and compressing the remaining ones if I'm not using them (hint, 7-zip is your friend when you want to save as much space as possible).
I think a fundamental axiom of IT is that users never had enough space. If you've been a system administrator you remember the constant stream of requests for more storage space. Even if you start with 1 PB of storage it will fill up faster than you anticipated (nature abhors unused storage capacity?). At the same time if you ask the users to maybe delete their older data, they tell you that the data is needed and they can't possible erase it.
What do you do? One answer, and the one that is typically taken, is to just add more storage. But perhaps there are different methods to help the situation.
Compressed File Systems
One way that we can save space is to use a compressed file system. The idea is that the users don't have to consciously compress the files but the file system automatically compresses the data for them. There are several file system options including btrfs, ZFS, or Reiser4 that can compress/decompress any data written to it or read from it. There are also FUSE (File Systems in User Space) file system options such as fusecompress, avf, and to some degree archivemount.
A compressed file system compresses the data as it writes to the data blocks or extents and uncompresses the data blocks or extents when read. There are many methods for doing this including holding a large number of data blocks or extents before writing them so that you can get much larger compression ratios. But you have to pay careful attention to the file system's logs and you need larger buffers. This approach uses more computing resources than a non-compressed file system but given the large number of cores on systems today – coupled with larger amounts of memory – this is probably not an issue (except maybe if you use an ARM processor). But the details really depend upon the file system and the implementation.
An example of a file system that offers compression is btrfs. To use compression you use a mount option. For example you can use "compression=zlib" or "compression=lzo" or "compression=no". The last option disables compression. There is also a mount flag "compress-force=
Btrfs does compression on a file basis. That is, the entire file is either compressed or not compressed, although in actuality it is based on extents rather than files. Btrfs can handle file systems that have some files that are compressed and some that are not. It can also handle files that are compressed with the two methods (zlib or lzo). You can read about the file compressions options at this FAQ.
To illustrate how a compressed file system might be used, I want to discuss archivemount. It is a FUSE based file system that allows you to mount a compressed archive file such as a .tar.gz file and read and write data to it a though it were a directory tree. You can also create sub-directories, delete them, even compress files within the compressed archive. While not strictly a compressed file system it gives you a flavor of how one works.
Even better, since archivemount is a FUSE-based file system, users can mount and umount it when they want or when they need. This means less intervention by the system administrator. The example I will use is creating an archive, mounting it, adding files and directories, and then unmounting it. I'll do all of this as a user.
The first step is to create a compressed tar file. I'll create a tar file using a simple text file and then use bzip2 to compress it (I like the higher compression that bzip2 offers over gzip).
[laytonjb@test8 ~]$ tar cf NOTES.tar jeffy.out.txt [laytonjb@test8 ~]$ bzip2 NOTES.tar [laytonjb@test8 ~]$ ls -sh NOTES.tar.bz2 12K NOTES.tar.bz2
I created a simple archive using tar with a small text file I had ("jeffy.out.txt"). Then I compressed it using bzip2. The resulting file is 12KB in size.
I'm using a fresh CentOS 6.4 system where I installed fuse, fuse-devel, libarchive, and libarchive-devel. I then downloaded the latest version of archivemount which is 0.8.2, coincidentally uploaded the day I wrote this article – it’s so fresh I can smell the electrons.
I built and installed archivemount which is a pretty easy sequence of "./configure", "make", and "make install". Then I created a mount point in my account and mounted the .tar.bz2 archive file.
[laytonjb@test8 ~]$ mkdir ARCHIVE [laytonjb@test8 ~]$ archivemount NOTES.tar.bz2 ARCHIVE fuse: missing mountpoint parameter [laytonjb@test8 ~]$ mount /dev/sda3 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0") /dev/sda1 on /boot type ext2 (rw) /dev/sda5 on /home type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) archivemount on /home/laytonjb/ARCHIVE type fuse.archivemount (rw,nosuid,nodev,user=laytonjb)
I'm not sure about the error notification when the archive is mounted but it didn't stop the archive from being mounted as you can see in the last command output.
The mounted archive can now be treated as any sub-directory so you can "cd" into it, create sub-directories using "mkdir", delete directories, etc.
[laytonjb@test8 ~]$ cd ARCHIVE [laytonjb@test8 ARCHIVE]$ ls -s total 32 32 jeffy.out.txt [laytonjb@test8 ARCHIVE]$ mkdir NEW [laytonjb@test8 ARCHIVE]$ cd NEW [laytonjb@test8 NEW]$ echo "this is a new file" >> file.txt [laytonjb@test8 NEW]$ ls -s total 1 1 file.txt [laytonjb@test8 NEW]$ cd .. [laytonjb@test8 ARCHIVE]$ ls -s total 36 32 jeffy.out.txt 4 NEW
As a further test, I copied another subdirectory into the archive. The subdirectory contains the images I use for backgrounds - about 279MB worth of images).
[laytonjb@test8 ~]$ cd BG.original/ [laytonjb@test8 BG.original]$ du -sh 279M .
Copying the directory into the archive:
[laytonjb@test8 ~]$ cp -r BG.original ARCHIVE/ [laytonjb@test8 ~]$ cd ARCHIVE [laytonjb@test8 ARCHIVE]$ ls -s total 40 4 BG.original 32 jeffy.out.txt 4 NEW
At this point the archive has not been updated so if, for some reason, the mounted archive crashed, you would lose all of the changes you have made to the archive. This is one of the limitations of archivemount is that the actual archive is not updated until it is unmounted.
[laytonjb@test8 ~]$ fusermount -u ARCHIVE [laytonjb@test8 ~]$ mount /dev/sda3 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0") /dev/sda1 on /boot type ext2 (rw) /dev/sda5 on /home type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) [laytonjb@test8 ~]$ ls -sh NOTES.tar.bz2 143M NOTES.tar.bz2
Note that the user can unmount the archive using the fusermount command.
To see if actually worked I examined the size of the compressed archive which is now 143MB (remember it started out as 12KB). Also notice that the archive is much smaller than the directory BG.original that was copied to the mounted archive. The directory is 279MB in size. Therefore the compression was almost 2:1 which is not bad considering that the directory mostly had images.
Archivemount is an example of a compressed file system albeit it a "local" variation that focuses on a subdirectory. But the point is that you can save space, sometimes quite a bit, by using a compressed file system. An advantage of archivemount itself is that it can be used by users and doesn't need intervention by an administrator.
In addition to compressed file systems there are other techniques for compressing data to save space. One that people tend to forget is to archive old data into a compressed read only archive that can be mounted as through it were a subdirectory tree.
Read-only Compressed File Systems
Live distributions contain a complete distribution that fits onto a single DVD despite have more data than a single DVD can hold. The trick is that they compress the files into an image and then mount that image as read-only. Using compressed images they can potentially save a great deal of space which is precisely why they are used.
You can use compressed images to save space but they are typically read-only. However, you can use these file systems to great effect despite not being able to write to them. The simple reason is that a reasonable amount of data in a user's account hasn't been touched or used for a fairly long time (you can define "long time" many different ways).
What if you could compress all of this older data in an image and then mount the image so that it appeared like the data was still there, except that it is read-only? If the compression was reasonable you could save some serious capacity.
A great example of this is squashfs. Squashfs was originally targeted at embedded devices but it has some features that make it very suitable for other situations.
|Max file system size||2^64|
|Max file size||16 EiB|
|Max number of files||unlimited|
|Max number of directories||unlimited|
|Sparse file support||yes|
Squashfs can create very large images (2^64) making it suitable for large systems (e.g. desktops and even servers). It compress all data including metadata, ".", and ".." files. It compresses metadata and fragments into blocks for maximum compression. This can also be useful during a read which triggers a decompress operation because the block that is uncompressed contains other metadata and fragments - sort of like a read-ahead algorithm that results from the compression. This additional data is placed into a cache so that it can mabe used later (squashfs has it's own cache).
Using squashfs is very easy. It comes with most modern distributions (it went into the 2.6.29 kernel) and you can install the squashfs tools using your package manager. For example, on my CentOS 6.4 system I just did "yum install squashfs-tools". I then created a squashfs file of a sub-directory in my home account. The directory, /home/laytonjb/Documents holds a number of text files but it also has a number of images, pdf's and other binary documents. I think this will give us an idea of the compression capability of squashfs. The first step was to create a squashfs file of the directory.
[root@test8 laytonjb]# time mksquashfs /home/laytonjb/Documents /squashfs/laytonjb/Documents.sqsh Parallel mksquashfs: Using 4 processors Creating 4.0 filesystem on /squashfs/laytonjb/Documents.sqsh, block size 131072. [=============================================================================|] 6810/6810 100% Exportable Squashfs 4.0 filesystem, data block size 131072 compressed data, compressed metadata, compressed fragments duplicates are removed Filesystem size 12166.75 Kbytes (11.88 Mbytes) 28.49% of uncompressed filesystem size (42698.04 Kbytes) Inode table size 50213 bytes (49.04 Kbytes) 22.77% of uncompressed inode table size (220527 bytes) Directory table size 57193 bytes (55.85 Kbytes) 17.48% of uncompressed directory table size (327131 bytes) Number of duplicate files found 115 Number of inodes 6854 Number of files 6682 Number of fragments 190 Number of symbolic links 0 Number of device nodes 0 Number of fifo nodes 0 Number of socket nodes 0 Number of directories 172 Number of ids (unique uids + gids) 1 Number of uids 1 laytonjb (500) Number of gids 1 laytonjb (500) real 0m6.080s user 0m3.838s sys 0m0.384s
There are a few things to notice in this output.
The first thing to notice is the first line of the output where the tool uses all 4 processors in the system. It will use all of the processors possible to create the file. Also notice a couple of lines down that it compressed 6,810 items using a data block size of 131,072 bytes. It also says that it will compress data, metadata, fragments, and that duplicates are removed.
Then notice that it says the file system size is about 11.88 MB. If you scan further in the output you will see that it found 115 duplicate files, 6,854 inodes, 190 fragments, 172 directories, and 1 ids (unique uid + gid). Also notice that it took about 10 seconds for the command to complete (adding real, user, and sys time).
I checked the size of the resulting squashfs file to see how much compression I achieved.
[laytonjb@test8 Documents]$ ls -lstarh /squashfs/laytonjb/Documents.sqsh 12M -rwx------. 1 root root 12M Oct 19 12:27 /squashfs/laytonjb/Documents.sqsh
The original directory used a total of 58MB. The compression ratio is about 4.83:1.
I then wanted to mount the squashfs file in place of the original directory. So I moved the original directory, "Documents", to "Documents.original" and then created the "Documents" directory again which will be the mount point for the squashfs file.
[laytonjb@test8 ~]$ mv Documents/ Documents.original [laytonjb@test8 ~]$ mkdir Documents
Finally, I mounted the squashfs file and then checked if the data was actually there.
[root@test8 laytonjb]# mount -t squashfs /squashfs/laytonjb/Documents.sqsh /home/laytonjb/Documents -o loop [root@test8 laytonjb]# mount /dev/sda3 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0") /dev/sda1 on /boot type ext2 (rw) /dev/sda5 on /home type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /squashfs/laytonjb/Documents.sqsh on /home/laytonjb/Documents type squashfs (rw,loop=/dev/loop0)
All of the files were there (I checked them as a user rather than root but I didn't show the output because the output of "ls" is not that interesting).
As mentioned in a study, users may insist that their data be online all of the time, but in actuality there is a great deal of data that is accessed very infrequently. This data is never really used but the user insists that it be online. A simple way to reduce the storage requirements of this type of data is to copy it to a subdirectory, create a squashfs file and mount it, but symlink the orignal files to the squashfs mount point (files and directories). It can sound a little complicated so let's walk through an example.
In my account I have a binsubdirectory that contains applications that I have built and installed into my account for various projects. I still use those binaries and I don't change them so they are a perfect candidate for being put into a squashfs image. The process I will follow is the following:
- Copy the subdirectories from /home/laytonjb/bin/ to /home/laytonjb/.ARCHIVE_data. The directory /home/laytonjb/.ARCHIVE_datais where I store the data before creating the squashfs image.
- Create symlinks from subdirectories in /home/laytonjb/bin/ to /home/laytonjb/.ARCHIVE. /home/laytonjb/.ARCHIVE is the mount point for the squashfs image so you want the original file/directory locations to point to the mount point.
- Create the squashfs image and store it in /home/laytonjb/SQUASHFS/
- Mount the squash image to /home/laytonjb/.ARCHIVE/
- Check if all files are there
- Erase /home/laytonjb/.ARCHIVE_date
I hope these steps clarify my intent but to summarize I want to take old data, copy to a specific location, symlink the original file and directory locations to the squashfs image mount point, create the squashfs image and mount it.
The first step is to create a storage directory and a mount point. I've chosen to use directories that begin with "." so they are not visible to a classic "ls" but you could have chosen to use any directories you want.
[laytonjb@test8 ~]$ mkdir .ARCHIVE [laytonjb@test8 ~]$ mkdir .ARCHIVE_data
The first directory is the squashfs mount point and the second one is where I store the data that goes into the squashfs file.
My bin subdirectory looks like the following.
[laytonjb@test8 ~]$ cd bin [laytonjb@test8 bin]$ ls -s total 28 4 hdf5-1.8.7-gcc-4.4.5 4 openmpi-1.4.5-gcc-4.4.5 4 open-mx-1.5.0-gcc-4.4.5 4 zlib-1.2.5-gcc-4.4.5 4 netcdf-1.4.3-gcc-4.4.5 4 openmpi-1.5.4-gcc-4.4.5 4 parallware
Now I want to copy all of these directories to /home/laytonjb/.ARCHIVE_data but create a symlink to /home/laytonjb/.ARCHIVE. An example of this is the following.
[laytonjb@test8 bin]$ mv hdf5-1.8.7-gcc-4.4.5/ ~/.ARCHIVE_data/ [laytonjb@test8 bin]$ ln -s ~/.ARCHIVE/hdf5-1.8.7-gcc-4.4.5 .
It's a pretty easy process that is very amenable to automation (cron job using bash or python).
When you are finished the bin subdirectory should be populated with symlinks.
[laytonjb@test8 bin]$ ls -s total 0 0 hdf5-1.8.7-gcc-4.4.5 0 openmpi-1.4.5-gcc-4.4.5 0 open-mx-1.5.0-gcc-4.4.5 0 zlib-1.2.5-gcc-4.4.5 0 netcdf-1.4.3-gcc-4.4.5 0 openmpi-1.5.4-gcc-4.4.5 0 parallware
The squashfs image is then created.
[laytonjb@test8 ~]$ time mksquashfs /home/laytonjb/.ARCHIVE_data /home/laytonjb/SQUASHFS/ARCHIVE.sqsh Parallel mksquashfs: Using 4 processors Creating 4.0 filesystem on /home/laytonjb/SQUASHFS/ARCHIVE.sqsh, block size 131072. [============================================================================/] 4274/4274 100% Exportable Squashfs 4.0 filesystem, data block size 131072 compressed data, compressed metadata, compressed fragments duplicates are removed Filesystem size 72740.63 Kbytes (71.04 Mbytes) 21.25% of uncompressed filesystem size (342305.09 Kbytes) Inode table size 23311 bytes (22.76 Kbytes) 32.54% of uncompressed inode table size (71646 bytes) Directory table size 19765 bytes (19.30 Kbytes) 43.35% of uncompressed directory table size (45599 bytes) Number of duplicate files found 257 Number of inodes 1909 Number of files 1803 Number of fragments 128 Number of symbolic links 0 Number of device nodes 0 Number of fifo nodes 0 Number of socket nodes 0 Number of directories 106 Number of ids (unique uids + gids) 1 Number of uids 1 laytonjb (500) Number of gids 1 laytonjb (500) real 0m20.382s user 1m19.682s sys 0m0.559s
Notice that the creation time was very fast (little over a minute) and of the 1,803 files, 257 were duplicates.
[laytonjb@test8 ~]$ squashfuse /home/laytonjb/SQUASHFS/ARCHIVE.sqsh /home/laytonjb/.ARCHIVE [laytonjb@test8 ~]$ mount /dev/sda3 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0") /dev/sda1 on /boot type ext2 (rw) /dev/sda5 on /home type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) squashfuse on /home/laytonjb/.ARCHIVE type fuse.squashfuse (rw,nosuid,nodev,user=laytonjb)
We can check if the files are there pretty easily.
[laytonjb@test8 ~]$ cd bin [laytonjb@test8 bin]$ cd openmpi-1.5.4-gcc-4.4.5/ [laytonjb@test8 openmpi-1.5.4-gcc-4.4.5]$ ls -s total 0 0 bin 0 etc 0 include 0 lib 0 share
The files are definitely there so we know the process was successful.
The goal was to reduce the amount of space the files used so let's check if this was successfully achieved.
[laytonjb@test8 ~]$ cd .ARCHIVE_data [laytonjb@test8 .ARCHIVE_data]$ du -sh 339M . [laytonjb@test8 .ARCHIVE_data]$ ls -sh ~/SQUASHFS/ARCHIVE.sqsh 72M /home/laytonjb/SQUASHFS/ARCHIVE.sqsh
The compression ratio is about 4.7:1 which I believe is pretty good.
Data Storage Squeeze
Considering that the world is creating more data and that we want to keep most or all of it, we need to store it. One way to help ourselves is to use efficient storage mechanisms such as compressed file systems. There are several options available ranging from typical file systems such as zfs or btrfs. But there are also options using FUSE for user space file systems. One example, archivemount, was examined in this article.
The advantage of archivemount is that it can be controlled by users and doesn't need administrator intervention. Also, when the archive is unmounted the compressed archive is updated and saved to the underlying file system. However, the disadvantage of using archivemount is that no updates to the underlying compressed archive happen until it is unmounted. If anything happens to the archive while it is mounted then there is the definite possibility of losing data.
If you don't want to use a compressing file system you can use read-only compressed images to achieve some of the same results. I bet that users have files that are old (using your definition of old) and haven't been accessed in a long time. It's very simple to compress these little used files using squashfs and use symlinks from the original file/directory location to the compressed image. This is a terribly simple method to save capacity and can be easily scripted to look for older files, create the symlinks and the compressed image and mount it.
Hopefully you've realized there are many ways to save capacity. With tight budgets you need to make the best use of what you have. You have to be frugal. Go forth and compress!
Photo courtesy of Shutterstock.