Storage Focus: Databases and Storage Architecture Page 3
Databases typically need operating system features and functions such as shared memory and semaphores. Additionally, databases can often take advantage of large memory within the computer. This is usually accomplished by changing tunable parameters within the database.
In many operating systems, I/O request sizes are limited to 256 KB or 128 KB without changes, which can impact I/O performance given that more requests to both the storage and the operating system will have to be completed.
File System and/or Volume Manager
Determining what the ideal volume manager and file system settings should be for each of the components of the database is one of the most important architectural decisions. For each type of I/O you might want to have different settings. Consider the following types of I/O:
- Long and short block sequential
- Long and short block random
- Long and short block multiple streams
- All reads
- All writes
No single file system with a single set of settings will perform well on all of these types of I/O, and I’m willing to bet that no file system with a single set of tunable parameters will do well on any two of these, or could not be improved with tuning changes.
The two key areas that must be architected are determining:
- What is the best volume manager and file system for the type of I/O that will be conducted
- What are the best tunable parameters for that file system and volume manager
A few years ago I was working with a database that was not scaling due to a number of factors, but I believed the primary reason was that the RAID cache was not being used efficiently for the index searches. The RAID hit rate was less than 20% on reads, and many of the reads were what I term randomly sequential (read sequential data for a number of requests, then a random skip increment, and then more sequential reads).
After reviewing the volume manager settings I realized where was the problem. Each file system had 32 LUNs, each with 8 GB. The stripe setting on the file system was 32 KB, matching the RAID allocation. Each of the index files was 2 GB.
Given how the RAID cache worked, you would have to have two sequential block reads before you would readahead the third block, which is a common algorithm. Therefore, you would need 32 KB*32 LUNs*2, or 2 MB of sequentially read data, before the next I/O would be in cache.
No wonder the RAID cache usage was so poor. The customer was told they had two choices to improve performance. One, allocate with the volume manager stripes of 2 GB so that each of the index files would be allocated sequentially; or two, get a different file system that would round-robin instead of stripe data. With round-robining, each open system call will be allocated to a different LUN, and all of the data for the file that was opened will be allocated on that LUN.
When we tested this configuration using the round-robin allocation method and the read cache, the hit rate went from 20% to 80% and the performance exceeded the requirements (for the customer at that time).