Data Storage: How to Save Space Frugally: Page 2 -

Data Storage: How to Save Space Frugally - Page 2


Establishing Digital Trust: Don't Sacrifice Security for Convenience

Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure

The mounted archive can now be treated as any sub-directory so you can "cd" into it, create sub-directories using "mkdir", delete directories, etc.

[laytonjb@test8 ~]$ cd ARCHIVE
[laytonjb@test8 ARCHIVE]$ ls -s
total 32
32 jeffy.out.txt
[laytonjb@test8 ARCHIVE]$ mkdir NEW
[laytonjb@test8 ARCHIVE]$ cd NEW
[laytonjb@test8 NEW]$ echo "this is a new file" >> file.txt
[laytonjb@test8 NEW]$ ls -s
total 1
1 file.txt
[laytonjb@test8 NEW]$ cd ..
[laytonjb@test8 ARCHIVE]$ ls -s
total 36
32 jeffy.out.txt   4 NEW

As a further test, I copied another subdirectory into the archive. The subdirectory contains the images I use for backgrounds - about 279MB worth of images).

[laytonjb@test8 ~]$ cd BG.original/
[laytonjb@test8 BG.original]$ du -sh
279M     .

Copying the directory into the archive:

[laytonjb@test8 ~]$ cp -r BG.original ARCHIVE/
[laytonjb@test8 ~]$ cd ARCHIVE
[laytonjb@test8 ARCHIVE]$ ls -s
total 40
4 BG.original  32 jeffy.out.txt   4 NEW

At this point the archive has not been updated so if, for some reason, the mounted archive crashed, you would lose all of the changes you have made to the archive. This is one of the limitations of archivemount is that the actual archive is not updated until it is unmounted.

[laytonjb@test8 ~]$ fusermount -u ARCHIVE
[laytonjb@test8 ~]$ mount
/dev/sda3 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/sda1 on /boot type ext2 (rw)
/dev/sda5 on /home type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

[laytonjb@test8 ~]$ ls -sh NOTES.tar.bz2
143M NOTES.tar.bz2

Note that the user can unmount the archive using the fusermount command.

To see if actually worked I examined the size of the compressed archive which is now 143MB (remember it started out as 12KB). Also notice that the archive is much smaller than the directory BG.original that was copied to the mounted archive. The directory is 279MB in size. Therefore the compression was almost 2:1 which is not bad considering that the directory mostly had images.

Archivemount is an example of a compressed file system albeit it a "local" variation that focuses on a subdirectory. But the point is that you can save space, sometimes quite a bit, by using a compressed file system. An advantage of archivemount itself is that it can be used by users and doesn't need intervention by an administrator.

In addition to compressed file systems there are other techniques for compressing data to save space. One that people tend to forget is to archive old data into a compressed read only archive that can be mounted as through it were a subdirectory tree.

Read-only Compressed File Systems

Live distributions contain a complete distribution that fits onto a single DVD despite have more data than a single DVD can hold. The trick is that they compress the files into an image and then mount that image as read-only. Using compressed images they can potentially save a great deal of space which is precisely why they are used.

You can use compressed images to save space but they are typically read-only. However, you can use these file systems to great effect despite not being able to write to them. The simple reason is that a reasonable amount of data in a user's account hasn't been touched or used for a fairly long time (you can define "long time" many different ways).

What if you could compress all of this older data in an image and then mount the image so that it appeared like the data was still there, except that it is read-only? If the compression was reasonable you could save some serious capacity.

A great example of this is squashfs. Squashfs was originally targeted at embedded devices but it has some features that make it very suitable for other situations.

Table 1: SquashFS features
Max file system size 2^64
Max file size 16 EiB
Max number of files unlimited
Max number of directories unlimited
Metadata compression? yes
Sparse file support yes
Exportable? yes
xattr support? yes
Compression Algorithms gzip,
LZMA (2.6.34),
LZO (2.6.34),
LZMA2 (2.6.38)

Squashfs can create very large images (2^64) making it suitable for large systems (e.g. desktops and even servers). It compress all data including metadata, ".", and ".." files. It compresses metadata and fragments into blocks for maximum compression. This can also be useful during a read which triggers a decompress operation because the block that is uncompressed contains other metadata and fragments - sort of like a read-ahead algorithm that results from the compression. This additional data is placed into a cache so that it can mabe used later (squashfs has it's own cache).

Using squashfs is very easy. It comes with most modern distributions (it went into the 2.6.29 kernel) and you can install the squashfs tools using your package manager. For example, on my CentOS 6.4 system I just did "yum install squashfs-tools". I then created a squashfs file of a sub-directory in my home account. The directory, /home/laytonjb/Documents holds a number of text files but it also has a number of images, pdf's and other binary documents. I think this will give us an idea of the compression capability of squashfs. The first step was to create a squashfs file of the directory.

[root@test8 laytonjb]# time mksquashfs /home/laytonjb/Documents /squashfs/laytonjb/Documents.sqsh
Parallel mksquashfs: Using 4 processors
Creating 4.0 filesystem on /squashfs/laytonjb/Documents.sqsh, block size 131072.
[=============================================================================|] 6810/6810 100%
Exportable Squashfs 4.0 filesystem, data block size 131072
compressed data, compressed metadata, compressed fragments
duplicates are removed
Filesystem size 12166.75 Kbytes (11.88 Mbytes)
28.49% of uncompressed filesystem size (42698.04 Kbytes)
Inode table size 50213 bytes (49.04 Kbytes)
22.77% of uncompressed inode table size (220527 bytes)
Directory table size 57193 bytes (55.85 Kbytes)
17.48% of uncompressed directory table size (327131 bytes)
Number of duplicate files found 115
Number of inodes 6854
Number of files 6682
Number of fragments 190
Number of symbolic links  0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 172
Number of ids (unique uids + gids) 1
Number of uids 1
laytonjb (500)
Number of gids 1
laytonjb (500)

real     0m6.080s
user     0m3.838s
sys      0m0.384s

There are a few things to notice in this output.

The first thing to notice is the first line of the output where the tool uses all 4 processors in the system. It will use all of the processors possible to create the file. Also notice a couple of lines down that it compressed 6,810 items using a data block size of 131,072 bytes. It also says that it will compress data, metadata, fragments, and that duplicates are removed.

Page 2 of 3

Previous Page
1 2 3
Next Page

Comment and Contribute


(Maximum characters: 1200). You have characters left.



Storage Daily
Don't miss an article. Subscribe to our newsletter below.

By submitting your information, you agree that may send you ENTERPRISEStorageFORUM offers via email, phone and text message, as well as email offers about other products and services that ENTERPRISEStorageFORUM believes may be of interest to you. ENTERPRISEStorageFORUM will process your information in accordance with the Quinstreet Privacy Policy.

We have made updates to our Privacy Policy to reflect the implementation of the General Data Protection Regulation.
Thanks for your registration, follow us on our social networks to keep up-to-date