Fixing SSD Performance Degradation, Part 2 Page 2


Want the latest storage insights?

Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure

Back to Page 1

The benchmarks in this article are designed to explore the impact of performance enhancement techniques on SSD performance over time. Specifically, I want to perform an initial set of benchmarks on an SSD, then torture it with some intensive I/O workloads, and then rerun the benchmarks and examine the differences. Before starting the test I do not know what the differences in performance will be but I do anticipate seeing some differences if, for no other reason, there will be some impact from the file system and any fragmentation caused by the testing.

The highlights of the system used in the testing are below:

  • GigaByte MAA78GM-US2H motherboard
  • An AMD Phenom II X4 920 CPU
  • 8GB of memory (DDR2-800)
  • Linux 2.6.34 kernel (with bcache patches only)
  • The OS and boot drive are on an IBM DTLA-307020 (20GB drive at Ultra ATA/100)
  • /home is on a Seagate ST1360827AS
  • Two drives used for storing scratch data are Seagate ST3500641AS-RK with 16 MB cache each. These are /dev/sdb and /dev/sdc.
  • Single Intel X25-E SLC disk (64GB) connected to a single SATA port (this is presented as /dev/sdd).

I used CentOS 5.4 on this system but I used my own kernel - 2.6.34 with some patches (which will be referred to as 2.6.34+ in the rest of this article). The 2.6.34 kernel was selected because it supports the TRIM command. Also, ext4 will be used as the file system since it also supports TRIM as well. The details of creating the ext4 file system are important since they are tailored for SSDs and are discussed in the following sub-section.


Building ext4

In researching options for building ext4 on an SSD, a blog from the primary maintainer of ext4 was found. Theodore Ts'o' blog discusses how he formatted ext4 for an Intel SSD. The first step was to partition the SSD to align partitions on 128KB boundaries (following Theodore's advice). This is accomplished by the common fdisk command:

[root@test64 ~]# fdisk -H 224 -S 56 /dev/sdd

where the -H option is the number of "heads" and the -S option is the number of sectors per track. Don't forget that fdisk still thinks of everything like a spinning disk so while these options perhaps don't make any sense for an SSD. But aligning the partitions on 128KB boundaries is important for best performance. As recommend by Theodore, I used the following command for creating the ext4 file system.

[root@test64 ~]# mke2fs -t ext4 -E stripe-width=32 resize=500G /dev/sdd1

The first option "stripe-width=32" was recommended as a way to improve performance and the second option "resize=500G" is used to reduce any wasted space in anticipation of growing the file system beyond 500GB's. Notice that I let ext4 select the journal size it on the SSD.



I chose to test three aspects of the SSD (and file system):

  1. Throughput
  2. IOPS
  3. Metadata

The first test stresses the bandwidth capability of the SSD while the second test stresses the ability to service I/O requests as quickly as possible (an often underestimated aspect of storage performance). The third test is more focused on the file system but it also measures the performance of the underlying storage device because it has to service the file system data requests as quickly as possible. Consequently, it is more related to IOPS in terms of a performance measure, but it will give us some insight into the performance of the file system on the storage device which could give us additional information on the performance of the SSD.


The benchmarks selected are IOzone for measuring both throughput and IOPS and metarates for measuring metadata performance.


IOzone is one of the most popular throughput benchmarks partly because it is open-source and is written in very plain ANSI C (not an insult but a compliment), and perhaps more importantly, it tests different I/O patterns which very few benchmarks actually do. It is capable of single thread, multi-threaded, and multi-client testing. The basic concept of IOzone is to break up a file of a given size into records. Records are written or read in some fashion until the file size is reached. Using this concept, IOzone has a number of tests that can be performed. The tests used in this article are:

  • Write
    This is a fairly simple test that simulates writing to a new file. Because of the need to create new metadata for the file, many times the writing of a new file can be slower than rewriting to an existing file. The file is written using records of a specific length (either specified by the user or chosen automatically by IOzone) until the total file length has been reached.

  • Re-write
    This test is similar to the write test but measures the performance of writing to a file that already exists. Since the file already exists and the metadata is present, it is commonly expected for the re-write performance to be greater than the write performance. This particular test opens the file, puts the file pointer at the beginning of the file, and then writes to the open file descriptor using records of a specified length until the total file size is reached. Then it closes the file which updates the metadata.

  • Read
    This test reads an existing file. It reads the entire file, one record at a time.

  • Re-read
    This test reads a file that was recently read. This test is useful because operating systems and file systems will maintain parts of a recently read file in cache. Consequently, re-read performance should be better than read performance because of the cache effects. However, sometimes the cache effect can be mitigated by making the file much larger than the amount of memory in the system.

  • Random Read
    This test reads a file with the accesses being made to random locations within the file. The reads are done in record units until the total reads are the size of the file. The performance of this test is impacted by many factors including the OS cache(s), the number of disks and their configuration, disk seek latency, and disk cache among others.

  • Random Write
    The random write test measures the performance when writing a file with the accesses being made to random locations with the file. The file is opened to the total file size and then the data is written in record sizes to random locations within the file.

  • Backwards Read
    This is a unique file system test that reads a file backwards. There are several applications, notably, MSC Nastran, that read files backwards. There are some file systems and even OS's that can detect this type of access pattern and enhance the performance of the access. In this test a file is opened and the file pointer is moved 1 record forward and then the file is read backward one record. Then the file pointer is moved 2 records backward in the file, and the process continues.

  • Record Rewrite
    This test measures the performance when writing and re-writing a particular spot with a file. The test is interesting because it can highlight "hot spot" capabilities within a file system and/or an OS. If the spot is small enough to fit into the various cache sizes; CPU data cache, TLB, OS cache, file system cache, etc., then the performance will be very good.

  • Strided Read
    This test reads a file in what is called a strided manner. For example, you could read at a file offset of zero for a length of 4 Kbytes, then seek 200 Kbytes forward, then read for 4 Kbytes, then seek 200 Kbytes, and so on. The constant pattern is important and the "distance" between the reads is called the stride (in this case it is 200 Kbytes). This access pattern is used by many applications that are reading certain data structures. This test can highlight interesting issues in file systems and storage because the stride could cause the data to miss any striping in a RAID configuration, resulting in poor performance.

  • fwrite
    This test measures the performance of writing a file using a library function "fwrite()". It is a binary stream function (examine the man pages on your system to learn more). Equally important, the routine performs a buffered write operation. This buffer is in user space (i.e. not part of the system caches). This test is performed with a record length buffer being created in a user-space buffer and then written to the file. This is repeated until the entire file is created. This test is similar to the "write" test in that it creates a new file, possibly stressing the metadata performance.

  • frewrite
    This test is similar to the "rewrite" test but using the fwrite() library function. Ideally the performance should be better than "Fwrite" because it uses an existing file so the metadata performance is not stressed in this case.

  • fread
    This is a test that uses the fread() library function to read a file. It opens a file, and reads it in record lengths into a buffer that is in user space. This continues until the entire file is read.

  • freread
    This test is similar to the "reread" test but uses the "fread()" library function. It reads a recently read file which may allow file system or OS cache buffers to be used, improving performance.

There are other options that can be tested, but for this exploration only the previously mentioned tests will be examined. However, even this list of tests is fairly extensive and covers a large number of application access patterns that you are likely to see (but not all of them).


Page3: Fixing SSD Performance Degradation, Part 2

Submit a Comment


People are discussing this article with 0 comment(s)