Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Interpretation of Results
After first examining the results I like to compare the ages to better understand what's happening in the file system.
Recall that ctime, change time, reports the last time a change was made to the file. This includes the data and/or the metadata. But the mtime of a file is only changed when the data itself is changed (not the metadata). I like to use both times when examining the ages of the files.
For the example, the vast majority of the files have their mtime between 365 and 504 days (1 to 1.5 years). But there are some a few files, 2,854, that have mtimes between 28 and 56 days. And finally there are some files that are between 2,920 and 3,650 days (8 to 10 years). One has to wonder if a number of the files couldn't be archived or maybe moved to slower storage.
After examining the mtime age data, I like to examine the ctime age data. Moreover, comparing the difference between the ctime age and mtime age, we can get an idea of how often and how quickly the file metadata changes. This information is important because it indicates the relative age of file metadata changes. If you measure this information over time, it can help you see the pace of metadata changes.
For the example, the vast majority of the files have a ctime age of 365 to 504 days (1 to 1.5 years). Additionally, if we look at the ctime-mtime ages, we see that almost all files have not changed much because the difference is 0 to 1 day (1,339 of the 54,036 files have the ctime age match the mtime age). This tells me that there is little changing of the metadata of files (i.e. permissions, etc.). Consequently, I don't think there is much worry that this user is thrashing metadata.
The atime age information can tell us about file access age. For example, if a file has a very old atime age, then the file hasn’t been accessed for some time. Maybe files like this are good candidates for archiving to reduce the capacity used. To prevent this, users may try to use the "touch" command to update the time stamp on the file. But the touch command just updates the ctime and not the mtime. [check on this via an experiment].
For the example scan, about one third of the files have an atime age in the 168-252 day range (6-9 months) and about 58 percent of the files are between 365 and 504 days (1 to 1.5 years). It appears as though a reasonable number of files are being accessed but over half are likely not being accessed, or at least not having their atime changed. I think this is most likely expected behavior in a user's home directory.
The file size distribution for the scan is also very interesting. The average file size is 1.22 MB, but the files range in size from virtually 0 KB to 1.964 GB. The standard deviation of the file size is 28.7MB, which is about 20 times the size of the average perhaps indicating large variation in file size. This is also supported by examining the distribution of file systems. About 32 percent of the files are 1KB or smaller, but the distribution appears to have a reasonable "tail" into the MB range.
This data and it's interpretation are just a snapshot in time of the state of the files (metadata of the metadata). There is some interesting information in the analysis, indicating that there are a large number of files that are between 1 and 1.5 years old that might be good candidates for archiving, if possible. Plus, there are a large number of smaller files on the file system but a few fairly large ones too (up to 1.9 GB). Smaller file access tends to really beat on metadata performance of a file system but in this particular case the vast majority of files have not been accessed in over a year, so the metadata performance is perhaps not a key issue at this time.
Paying attention to what's happening with the files on your storage system is a key task of administrators. Monitoring and understanding how and when the files on the storage are created, used, modified and removed by all of the users helps you understand how the storage is being used.
From this, one can get an idea of trends over time. How much capacity is being used? Who are the big users of the storage? Is there much old data on the storage? And ultimately, when and how much storage will I need in the future? The overall theme is really metadata about metadata.
In this article I've tried to illustrate that it's possible to write simple tools to gather information about the state of the file system. But, more importantly, I hope I've illustrated what kind of information is useful to gather and how you can use that information to begin to understand what is happening on the file system.