All too often I have seen the process of determining what storage and configuration settings should be assigned to a database become a completely disjointed process. The database architect is typically a database expert and knows little about the low-level details of storage configuration. Combine this with the fact that the storage administrators and architects often know little about how the database will use the underlying storage and what the requirements and best configuration are for the database, index files, the log files, and of course, the file system and volume manager.
This typically results in an inefficient use of storage resources that can increase the overall cost and reduce the performance to a point where you might not be able to meet your requirements. Add to this the ever looming budget and the fact that management needs things to work efficiently with the available budget. This article will address some of the issues that DBAs and storage architecture must work together on when addressing the architecture problems.
The hardware and software requirements for end-to-end storage architecture for most databases are:
- Control files
- Table space
- Index files
- Redo logs
- Operating System
- File system and volume manager (this may not be relevant if the database runs on a raw device)
- Host bus adapters (HBAs)
- Storage hardware
Each of these areas has multiple components, features, and functions that can have a significant impact on the overall performance.
The database application itself has multiple features and functions that must be taken into consideration. The components for Oracle are:
- Control files – These are files that record the physical structure of the database and are required for the database to be active
- Table Space – The actual data from the database with the various rows and columns of the database
- Index Files/Space – Indexes are not required structures in Oracle but are almost always used for large databases, as they allow search operations to run significantly faster when looking for something in the database
- Redo logs – The active database requests that allow you to rebuild and restart the database after a crash (these logs are similar in nature to the file system log)
Because they almost always have different types of access patterns, each of these file types could be stored in a different file system with tuning options. Other databases have similar types of files that need to be taken into consideration in similar ways.
Most databases recommend multiple control files to ensure reliability. The control files are not written or read that often, but you need to make sure that the files are located on different RAID sets and preferably on different RAID controllers.
This is often the largest amount of data in the database. The table space can be accessed with larger I/O requests when reading large tables on columns. Depending on the sizes and frequency of the updates, the table space can often be placed on larger RAID-5 with striping to achieve improved performance and density over RAID-1.
In many databases, index files are the most often accessed data. Searching index files can be very IOPS (Input/Output operations per second) intensive. In addition, sometimes databases are re-indexed, which is computationally intensive and also requires significant I/O bandwidth. The index space can be very large depending on the database and the types of searches required — in general, the index file sizes are 2 GB based on the history of UNIX file sizes.
The redo log files are filled with records that allow you to back out from changes made to your database. These are called redo records. The redo records are used in a circular buffer. As the redo logs are often small I/Os, RAID-1 is a good choice. As two or more redo logs are required, it is often a good idea to place the log files on different RAID-1 volumes.
Databases typically need operating system features and functions such as shared memory and semaphores. Additionally, databases can often take advantage of large memory within the computer. This is usually accomplished by changing tunable parameters within the database.
In many operating systems, I/O request sizes are limited to 256 KB or 128 KB without changes, which can impact I/O performance given that more requests to both the storage and the operating system will have to be completed.
File System and/or Volume Manager
Determining what the ideal volume manager and file system settings should be for each of the components of the database is one of the most important architectural decisions. For each type of I/O you might want to have different settings. Consider the following types of I/O:
- Long and short block sequential
- Long and short block random
- Long and short block multiple streams
- All reads
- All writes
No single file system with a single set of settings will perform well on all of these types of I/O, and I’m willing to bet that no file system with a single set of tunable parameters will do well on any two of these, or could not be improved with tuning changes.
The two key areas that must be architected are determining:
- What is the best volume manager and file system for the type of I/O that will be conducted
- What are the best tunable parameters for that file system and volume manager
A few years ago I was working with a database that was not scaling due to a number of factors, but I believed the primary reason was that the RAID cache was not being used efficiently for the index searches. The RAID hit rate was less than 20% on reads, and many of the reads were what I term randomly sequential (read sequential data for a number of requests, then a random skip increment, and then more sequential reads).
After reviewing the volume manager settings I realized where was the problem. Each file system had 32 LUNs, each with 8 GB. The stripe setting on the file system was 32 KB, matching the RAID allocation. Each of the index files was 2 GB.
Given how the RAID cache worked, you would have to have two sequential block reads before you would readahead the third block, which is a common algorithm. Therefore, you would need 32 KB*32 LUNs*2, or 2 MB of sequentially read data, before the next I/O would be in cache.
No wonder the RAID cache usage was so poor. The customer was told they had two choices to improve performance. One, allocate with the volume manager stripes of 2 GB so that each of the index files would be allocated sequentially; or two, get a different file system that would round-robin instead of stripe data. With round-robining, each open system call will be allocated to a different LUN, and all of the data for the file that was opened will be allocated on that LUN.
When we tested this configuration using the round-robin allocation method and the read cache, the hit rate went from 20% to 80% and the performance exceeded the requirements (for the customer at that time).
Host Bus Adapters (HBAs)
Even a $2,000 HBA can make a significant impact on the performance of a large database. There are two areas of concern for HBAs:
- The number of outstanding I/O requests
- The largest request that can be made
Most HBAs set the default number of outstanding requests to 16 in the driver software. This limits the number of commands sent to the RAID devices. Given that with many disk drives and even with random I/O, 16 outstanding requests might not be enough to fully utilize the storage.
Many operating systems and device drivers limit the size of the I/O requests to values less than the request needed to read from or write to the table space. The limits within the device driver should be changed to allow larger requests. Setting this is of course different on every device drive and OS, and just to make things more interesting, it often changes.
Storage hardware is likely to be one of the most important parts for architecting a system for databases. You might want to have a number of different LUNs for the different types of I/O that will be occurring within the database. For example, you often have:
- High bandwidth requirements (64 KB) for the redo log files, and I/O to the redo logs is mostly writes
- High bandwidth, small block random (8 KB) I/O for the index searches, and in many cases I/O to the indexes is mostly reads
- Large block I/O (256 KB) for the table space, and typically I/O to the table space is mostly reads
As you can see, one size does not and cannot fit all of the requirements. Therefore you need to match the:
- RAID level with the typical access type for reads/writes
- Stripe width with the request size
- Bandwidth requirements with the RAID level and request size
- Cache policies with the type of I/O being done
While this is seemingly no easy task, if you break it down to the base level issues, the problem is not all that hard.
Depending on the size of your redo logs and the amount of bandwidth, you may initially think you require a RAID-5 stripe. This really depends, though, as most 10K RPM disks transfer data on the outer cylinders at 69 MB/sec and the inner cylinders at 39 MB/sec, and 15K RPM disks are even faster. When you add to this the size of the RAID cache, you may not need to use RAID-5. It really depends on:
- Bandwidth requirements – Megabytes per second of log data
- Size of the log – Can it fit in cache?
- Your RAID's speed
The key information missing here needs to be collected by you. You need to determine from looking at the system with various databases and system tools whether or not the performance of the redo logs is limiting the performance and scaling of the database, and if so, what the I/O requirements of your redo logs are.
Architecture for index files is reasonably simple. If you need to go fast, then use RAID-1 with a small stripe value with high performance 15K disks. Since index files are small block reads and are often random I/O, this will be by far the fastest way to go.
Depending on the size of the tables and how they are accessed and searched, RAID-1 is sometimes a better choice, while at other times RAID-5 is the best choice. The key is determining what the I/O request size to the table space is going to be. The size of the requests is often dependent on tunable parameters within the database.
Many books and documents have been written about tuning databases on many different operating systems. Here are a few that I have read and found useful:
Configuring and Tuning Databases on the Solaris Platform by Allan N. Packer (Author), Sun Microsystems Press (Author), Publisher: Prentice Hall (December 15, 2001), ISBN: 0130834173
Oracle9i Performance Tuning Tips & Techniques by Richard J. Niemiec, Publisher: McGraw-Hill Osborne Media (May 12, 2003), ISBN: 0072224738
Creating a Self-Tuning Oracle Database: Automating Oracle9i Dynamic SGA Performance (Oracle In-Focus series) by Donald K. Burleson, Publisher: Rampant TechPress (August 1, 2003), ISBN: 0972751327
Architecting for databases is just like any other application. You need to determine the I/O patterns from the application to the file system/volume manager to the HBA to the RAID, all while keeping in mind the performance requirements and cost issues. Tuning is a bit harder given the complexity of the application, but many tools exist for looking at the data to help you understand the underlying I/O issues.