Data Storage: How to Save Space Frugally - Page 2


Want the latest storage insights?

Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure

Share it on Twitter  
Share it on Facebook  
Share it on Google+
Share it on Linked in  

The mounted archive can now be treated as any sub-directory so you can "cd" into it, create sub-directories using "mkdir", delete directories, etc.

[laytonjb@test8 ~]$ cd ARCHIVE
[laytonjb@test8 ARCHIVE]$ ls -s
total 32
32 jeffy.out.txt
[laytonjb@test8 ARCHIVE]$ mkdir NEW
[laytonjb@test8 ARCHIVE]$ cd NEW
[laytonjb@test8 NEW]$ echo "this is a new file" >> file.txt
[laytonjb@test8 NEW]$ ls -s
total 1
1 file.txt
[laytonjb@test8 NEW]$ cd ..
[laytonjb@test8 ARCHIVE]$ ls -s
total 36
32 jeffy.out.txt   4 NEW

As a further test, I copied another subdirectory into the archive. The subdirectory contains the images I use for backgrounds - about 279MB worth of images).

[laytonjb@test8 ~]$ cd BG.original/
[laytonjb@test8 BG.original]$ du -sh
279M     .

Copying the directory into the archive:

[laytonjb@test8 ~]$ cp -r BG.original ARCHIVE/
[laytonjb@test8 ~]$ cd ARCHIVE
[laytonjb@test8 ARCHIVE]$ ls -s
total 40
4 BG.original  32 jeffy.out.txt   4 NEW

At this point the archive has not been updated so if, for some reason, the mounted archive crashed, you would lose all of the changes you have made to the archive. This is one of the limitations of archivemount is that the actual archive is not updated until it is unmounted.

[laytonjb@test8 ~]$ fusermount -u ARCHIVE
[laytonjb@test8 ~]$ mount
/dev/sda3 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/sda1 on /boot type ext2 (rw)
/dev/sda5 on /home type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

[laytonjb@test8 ~]$ ls -sh NOTES.tar.bz2
143M NOTES.tar.bz2

Note that the user can unmount the archive using the fusermount command.

To see if actually worked I examined the size of the compressed archive which is now 143MB (remember it started out as 12KB). Also notice that the archive is much smaller than the directory BG.original that was copied to the mounted archive. The directory is 279MB in size. Therefore the compression was almost 2:1 which is not bad considering that the directory mostly had images.

Archivemount is an example of a compressed file system albeit it a "local" variation that focuses on a subdirectory. But the point is that you can save space, sometimes quite a bit, by using a compressed file system. An advantage of archivemount itself is that it can be used by users and doesn't need intervention by an administrator.

In addition to compressed file systems there are other techniques for compressing data to save space. One that people tend to forget is to archive old data into a compressed read only archive that can be mounted as through it were a subdirectory tree.

Read-only Compressed File Systems

Live distributions contain a complete distribution that fits onto a single DVD despite have more data than a single DVD can hold. The trick is that they compress the files into an image and then mount that image as read-only. Using compressed images they can potentially save a great deal of space which is precisely why they are used.

You can use compressed images to save space but they are typically read-only. However, you can use these file systems to great effect despite not being able to write to them. The simple reason is that a reasonable amount of data in a user's account hasn't been touched or used for a fairly long time (you can define "long time" many different ways).

What if you could compress all of this older data in an image and then mount the image so that it appeared like the data was still there, except that it is read-only? If the compression was reasonable you could save some serious capacity.

A great example of this is squashfs. Squashfs was originally targeted at embedded devices but it has some features that make it very suitable for other situations.

Table 1: SquashFS features
Max file system size2^64
Max file size16 EiB
Max number of filesunlimited
Max number of directoriesunlimited
Metadata compression?yes
Sparse file supportyes
xattr support?yes
Compression Algorithmsgzip,
LZMA (2.6.34),
LZO (2.6.34),
LZMA2 (2.6.38)

Squashfs can create very large images (2^64) making it suitable for large systems (e.g. desktops and even servers). It compress all data including metadata, ".", and ".." files. It compresses metadata and fragments into blocks for maximum compression. This can also be useful during a read which triggers a decompress operation because the block that is uncompressed contains other metadata and fragments - sort of like a read-ahead algorithm that results from the compression. This additional data is placed into a cache so that it can mabe used later (squashfs has it's own cache).

Using squashfs is very easy. It comes with most modern distributions (it went into the 2.6.29 kernel) and you can install the squashfs tools using your package manager. For example, on my CentOS 6.4 system I just did "yum install squashfs-tools". I then created a squashfs file of a sub-directory in my home account. The directory, /home/laytonjb/Documents holds a number of text files but it also has a number of images, pdf's and other binary documents. I think this will give us an idea of the compression capability of squashfs. The first step was to create a squashfs file of the directory.

[root@test8 laytonjb]# time mksquashfs /home/laytonjb/Documents /squashfs/laytonjb/Documents.sqsh
Parallel mksquashfs: Using 4 processors
Creating 4.0 filesystem on /squashfs/laytonjb/Documents.sqsh, block size 131072.
[=============================================================================|] 6810/6810 100%
Exportable Squashfs 4.0 filesystem, data block size 131072
compressed data, compressed metadata, compressed fragments
duplicates are removed
Filesystem size 12166.75 Kbytes (11.88 Mbytes)
28.49% of uncompressed filesystem size (42698.04 Kbytes)
Inode table size 50213 bytes (49.04 Kbytes)
22.77% of uncompressed inode table size (220527 bytes)
Directory table size 57193 bytes (55.85 Kbytes)
17.48% of uncompressed directory table size (327131 bytes)
Number of duplicate files found 115
Number of inodes 6854
Number of files 6682
Number of fragments 190
Number of symbolic links  0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 172
Number of ids (unique uids + gids) 1
Number of uids 1
laytonjb (500)
Number of gids 1
laytonjb (500)

real     0m6.080s
user     0m3.838s
sys      0m0.384s

There are a few things to notice in this output.

The first thing to notice is the first line of the output where the tool uses all 4 processors in the system. It will use all of the processors possible to create the file. Also notice a couple of lines down that it compressed 6,810 items using a data block size of 131,072 bytes. It also says that it will compress data, metadata, fragments, and that duplicates are removed.

Submit a Comment


People are discussing this article with 0 comment(s)