Data Storage: How to Save Space Frugally - Page 2
The mounted archive can now be treated as any sub-directory so you can "cd" into it, create sub-directories using "mkdir", delete directories, etc.
[laytonjb@test8 ~]$ cd ARCHIVE [laytonjb@test8 ARCHIVE]$ ls -s total 32 32 jeffy.out.txt [laytonjb@test8 ARCHIVE]$ mkdir NEW [laytonjb@test8 ARCHIVE]$ cd NEW [laytonjb@test8 NEW]$ echo "this is a new file" >> file.txt [laytonjb@test8 NEW]$ ls -s total 1 1 file.txt [laytonjb@test8 NEW]$ cd .. [laytonjb@test8 ARCHIVE]$ ls -s total 36 32 jeffy.out.txt 4 NEW
As a further test, I copied another subdirectory into the archive. The subdirectory contains the images I use for backgrounds - about 279MB worth of images).
[laytonjb@test8 ~]$ cd BG.original/ [laytonjb@test8 BG.original]$ du -sh 279M .
Copying the directory into the archive:
[laytonjb@test8 ~]$ cp -r BG.original ARCHIVE/ [laytonjb@test8 ~]$ cd ARCHIVE [laytonjb@test8 ARCHIVE]$ ls -s total 40 4 BG.original 32 jeffy.out.txt 4 NEW
At this point the archive has not been updated so if, for some reason, the mounted archive crashed, you would lose all of the changes you have made to the archive. This is one of the limitations of archivemount is that the actual archive is not updated until it is unmounted.
[laytonjb@test8 ~]$ fusermount -u ARCHIVE [laytonjb@test8 ~]$ mount /dev/sda3 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0") /dev/sda1 on /boot type ext2 (rw) /dev/sda5 on /home type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) [laytonjb@test8 ~]$ ls -sh NOTES.tar.bz2 143M NOTES.tar.bz2
Note that the user can unmount the archive using the fusermount command.
To see if actually worked I examined the size of the compressed archive which is now 143MB (remember it started out as 12KB). Also notice that the archive is much smaller than the directory BG.original that was copied to the mounted archive. The directory is 279MB in size. Therefore the compression was almost 2:1 which is not bad considering that the directory mostly had images.
Archivemount is an example of a compressed file system albeit it a "local" variation that focuses on a subdirectory. But the point is that you can save space, sometimes quite a bit, by using a compressed file system. An advantage of archivemount itself is that it can be used by users and doesn't need intervention by an administrator.
In addition to compressed file systems there are other techniques for compressing data to save space. One that people tend to forget is to archive old data into a compressed read only archive that can be mounted as through it were a subdirectory tree.
Read-only Compressed File Systems
Live distributions contain a complete distribution that fits onto a single DVD despite have more data than a single DVD can hold. The trick is that they compress the files into an image and then mount that image as read-only. Using compressed images they can potentially save a great deal of space which is precisely why they are used.
You can use compressed images to save space but they are typically read-only. However, you can use these file systems to great effect despite not being able to write to them. The simple reason is that a reasonable amount of data in a user's account hasn't been touched or used for a fairly long time (you can define "long time" many different ways).
What if you could compress all of this older data in an image and then mount the image so that it appeared like the data was still there, except that it is read-only? If the compression was reasonable you could save some serious capacity.
A great example of this is squashfs. Squashfs was originally targeted at embedded devices but it has some features that make it very suitable for other situations.
|Max file system size||2^64|
|Max file size||16 EiB|
|Max number of files||unlimited|
|Max number of directories||unlimited|
|Sparse file support||yes|
Squashfs can create very large images (2^64) making it suitable for large systems (e.g. desktops and even servers). It compress all data including metadata, ".", and ".." files. It compresses metadata and fragments into blocks for maximum compression. This can also be useful during a read which triggers a decompress operation because the block that is uncompressed contains other metadata and fragments - sort of like a read-ahead algorithm that results from the compression. This additional data is placed into a cache so that it can mabe used later (squashfs has it's own cache).
Using squashfs is very easy. It comes with most modern distributions (it went into the 2.6.29 kernel) and you can install the squashfs tools using your package manager. For example, on my CentOS 6.4 system I just did "yum install squashfs-tools". I then created a squashfs file of a sub-directory in my home account. The directory, /home/laytonjb/Documents holds a number of text files but it also has a number of images, pdf's and other binary documents. I think this will give us an idea of the compression capability of squashfs. The first step was to create a squashfs file of the directory.
[root@test8 laytonjb]# time mksquashfs /home/laytonjb/Documents /squashfs/laytonjb/Documents.sqsh Parallel mksquashfs: Using 4 processors Creating 4.0 filesystem on /squashfs/laytonjb/Documents.sqsh, block size 131072. [=============================================================================|] 6810/6810 100% Exportable Squashfs 4.0 filesystem, data block size 131072 compressed data, compressed metadata, compressed fragments duplicates are removed Filesystem size 12166.75 Kbytes (11.88 Mbytes) 28.49% of uncompressed filesystem size (42698.04 Kbytes) Inode table size 50213 bytes (49.04 Kbytes) 22.77% of uncompressed inode table size (220527 bytes) Directory table size 57193 bytes (55.85 Kbytes) 17.48% of uncompressed directory table size (327131 bytes) Number of duplicate files found 115 Number of inodes 6854 Number of files 6682 Number of fragments 190 Number of symbolic links 0 Number of device nodes 0 Number of fifo nodes 0 Number of socket nodes 0 Number of directories 172 Number of ids (unique uids + gids) 1 Number of uids 1 laytonjb (500) Number of gids 1 laytonjb (500) real 0m6.080s user 0m3.838s sys 0m0.384s
There are a few things to notice in this output.
The first thing to notice is the first line of the output where the tool uses all 4 processors in the system. It will use all of the processors possible to create the file. Also notice a couple of lines down that it compressed 6,810 items using a data block size of 131,072 bytes. It also says that it will compress data, metadata, fragments, and that duplicates are removed.