Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
As mentioned, if the 8 bytes are not added, this will cause poor performance. For each write request the system will have to read-modify write. Read-modify-write happens when the requests do not begin and end on 512 byte boundaries. Each time data is written, the system will have to read the data into the system buffers, and then the system will write the data from the user space to the system buffers such that it is written on 512 byte boundaries, and then data is written from the system buffers to the device.
The rule is that all I/O must be done on physical hardware boundaries -- you either provide it to the hardware, or it must be done in software. This will happen for each write, which significantly increases system overhead and dramatically reduces I/O performance.
So for programs using fopen/fread/fwrite, making larger I/O requests is easy if you have access to the source code. The real question is how to determine the optimal library buffer size. My rule of thumb for sequential I/O is to make the library buffer size at least 4 times the size of the fread(3) and/or fwrite(3) request size. If you can afford the memory usage, the library buffer size should be a much larger multiple in the range of 512 KB to 16 MB.
Determining the correct size has a great deal to do with the rest of the I/O path including the operating system, file system, volume manager, and storage hardware. Ascertaining the exact optimal value will have to wait until a later article, but making it large will immediately improve performance over using the default. Of course, this only works for files for which you are performing a great deal of I/O.
If you have an application which does a large amount of random I/O, making the buffer larger than the I/O request will hurt performance, as you will read data into the buffer that you will not end up using. The only time making the buffer larger than the request is helpful is when you can fit the entire file into the library buffer. Sometimes for older applications where memory is at a premium, the whole file could be placed into memory by using a large library buffer.
The real issue is that you will need to match the application I/O efficiency with the amount of storage necessary. If you have an application requirement of 190 MB/sec for reading and writing and the application makes 1 KB random I/O requests, the amount of hardware needed to support that requirement is much greater (likely 10x greater) than if the application makes 16 MB sequential I/O requests.
What I plan to address over the next months is the whole I/O path and issues with performance and tuning for the server hardware, operating system, file system/volume manager, HBA, FC devices (such as tape and disk), and applications (such as databases).