Test Plan for Linux File System Fsck Testing


Want the latest storage insights?

Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure

Share it on Twitter  
Share it on Facebook  
Share it on Google+
Share it on Linked in  

Building on Henry's Problem Statement, this article will present the test plan for performing fsck tests on Linux file systems. The goal is to test fairly large file systems that might be encountered on large systems to determine the status of file system check (fsck) performance. We ask and appreciate your feedback on the test plan.

Introduction to Fsck Testing

Testing storage systems or any aspect of IT systems is definitely not an easy task. It takes careful planning, testing and hardware for proper benchmarks. Even if we are trying to be careful, it can be easy to forget, omit (either by design or as an accident), or misconfigure systems and benchmarks. Hence, the results are, unfortunately, less useful and maybe don't meet the original requirements. Henry and I often call these Slimy Benchmarking Tricks (SBTs). The end result is that good tests or benchmarks are difficult to do well. Perhaps as a consequence, much of the benchmarking we see today is of very poor quality, to the degree that it is virtually useless and more often than not, entertaining (and sometime frustrating).

Even if the benchmarks are done well, there is still the problem of correlating the benchmarks/tests to your application workload. This is true for computing-oriented benchmarks, such as taking something like H.264 encoding tests and determining how the benchmarks correlate to your weather modeling applications. This is also true for storage benchmarks. How do Postmark results correlate to an MPI-IO application that is doing astrophysics simulation? Or how do IOR results correlate to my database performance? The answer is as simple as it is nebulous--it depends.

There is no magic formula that tells you how to correlate benchmarks to real application workloads and more specifically, your application workload. The best predictor of your application workload's performance is, believe it or not, your application workload. However, it's not always possible to test your workload against storage solutions that range in terms of hardware, networking, file systems, file system tuning, clients, OS and so on. This is why we rely on benchmarks or tests to as an indicator of how our workload might perform. Typically, this means you have to take these benchmarks, run them on your existing systems, and compare the trends to the trends of your application workloads.

For example, you could take your existing systems and run a variety of benchmarks/tests against them--IOR, IOzone, Postmark and so on--and run your workloads on the same systems. Then, you can compare the two sets of results and look for correlation. This might tell you which benchmark(s) track closest to your application, indicating which benchmark/test you should focus on when you look for data about new hardware or new file systems. But this task isn't easy and it takes time--time we usually don't have. However, to effectively use benchmarks and tests we need to understand this correlation and how it affects us. Otherwise it's just marketing information.

Keeping these ideas in mind, our goal is to examine the fsck (file system check) performance of Linux file systems by filling them with dummy data, and then executing an appropriate file system check. This article describes the approach and the details we will be using in running these tests. Fortunately, in our case, the benchmark/test is fairly simple--fsck wall clock time-- so this should make our lives, and yours, a bit easier.

The following sections go over the details of the testing. Please read them carefully, and we encourage your feedback on the test plan with suggestions/comments.

Benchmark/Test Process

I've written elsewhere about benchmarking storage systems. We will try to adhere to the tenants presented in the article and be as transparent as possible. However, if you have any questions or comments, we encourage you to post them.

The basic plan for the testing only has three steps:

  1. Create the file system
  2. Fill the file system
  3. Run fsck and time how long it takes

That's pretty much it--not too complicated at this level, but the devil is always in the details.

Since our goal is to understand the fsck time for current Linux file systems, we will run several tests run to develop an understanding of how the fsck performance scales with the number of files and file system size. We'll test both XFS and ext4 for three values of the number of files: 1) 100 million files, 2) 50 million files and 3) 10 million files.

According to the Red Hat documents Henry previously mentioned, XFS is supported to 100TBs. The testing hardware we have access to limits the total file system to about 80TBs, formatted (more on that in the next section). We'll also test at 40TBs (half that size). For testing ext4, the same Red Hat document says that only a 16TB file system is currently supported. To prevent running up against any unforeseen problems, we'll test with a 10TB file system and a 5TB file system.

An fsck for ext4 for both of these file system sizes should not take a long to run, since there are very few spindles and a large number of files. Consequently, we will run these tests last.

Overall, the fsck tests that are to be run are on the following combinations.

    1. 80TB XFS File System
      • 100 Million files
      • 50 Million files
      • 10 Million files


    1. 40TB XFS File System

      • 100 Million files
      • 50 Million files
      • 10 Million files


    1. 10 TB ext4 File System

      • 100 Million Files
      • 50 Million Files
      • 10 Million Files


  1. 5TB ext4 File System

    • 100 Million Files
    • 50 Million Files
    • 10 Million Files

For each of these combinations, three basic steps will be followed--

  1. Create the file system
  2. Fill the file system
  3. Run fsck and time the results

For ext4, we will use fsck.ext4 to execute the file system check. For XFS, we will use xfs_repair (Note: xfs_check just walks the file system and doesn't do repairs. We want to run tests using the same commands an admin would be using, which is xfs_repair.).

One of the key pieces in the testing is how to fill the file system. The tool we will be using is called fs_mark. It was developed by Ric Wheeler (now at Red Hat) to test file systems. Fs_mark will test various aspects of the file system performance, which is interesting, but not the focus of this test. However, in running the tests, fs_mark will conveniently create the file system, which is what is needed.

Submit a Comment


People are discussing this article with 0 comment(s)