When a new technology has evolved rapidly, it is easy to find the topic confusing–and RAID storage is one of these potentially perplexing technologies. However, selecting a RAID disk array is much like choosing any other product. You just need to ask the right questions so you can examine the plethora of products and select those few that best suit your needs.
Shopping for a RAID disk array can be simple, once you know what you really need and understand what the technology can offer you. RAID disk arrays provide data storage with a high degree of operational availability and performance, depending on the feature set available from various vendors. All RAID disk arrays will provide you with some level of storage, reliability, performance, and service for different prices. Your job is to determine which is the best fit for you. In this article, I’ll walk you through the decision process for choosing a RAID disk array.
Storage
All RAID disk arrays provide storage, but so does any old bunch of disks. Clearly, anyone can pile boxes on boxes to meet your total storage requirements. However, you need to consider these questions as you shop: How well does the storage scale from single units to racks of storage? Do all the disks use a limited number of host interfaces? Do all the disks use a very small number of controllers? These are sources of potential data bottlenecks. Does the vendor offer a rack cabinet? Who provides uninterruptible power supplies (UPSs), and are they adequate? Does the system require software drivers that must be updated constantly, or is the product a truly plug-and-play open system? Is the product scalable as your needs grow? Can it be moved to another platform and operating system easily and without added expense?
RAID reliability
Most vendors will provide you with an implementation of RAID in which one extra disk is used for parity information to recreate lost data in the event of a single disk failure. You should avoid any vendor that does not meet this minimum criterion. Given that most vendors will offer similar capabilities on this most basic issue, you should ask more detailed questions: How long does the controller take to rebuild the data? Does the controller permit an automatic hot-spare replacement? How easy is it to monitor and control the status of the array? Do you need to be at the system to monitor and control it, or can you operate remotely–and if so, how?
RAID modes of operation
RAID levels
RAID levels 0 through 6 were defined in the original University of California Berkeley RAID project. RAID 2, 4, and 6 are rarely seen in commercial products. RAID 0 is merely disk striping, which has some performance advantages but stores no parity information and thus does not offer true RAID data protection. RAID 1 offers complete duplication of data, and this 100 percent data redundancy provides the best protection–but it is much too expensive for most applications. RAID 3 and RAID 5 each use one extra disk to store parity information needed to recreate data in the event of a single disk failure. RAID 3 uses a dedicated parity disk and is typically faster for throughput-oriented applications, such as file transfer and other sequential applications. RAID 5 distributes the parity information across all disks in the array and is typically faster for transaction processing and other random access applications. These results are relevant mostly in arrays that have little or no controller cache memory. In products with significant cache memory (64MB or more) on-board the controller, performance will be higher in all cases due to the distinctly higher abilities of the controller; these products will perform in a vastly superior manner regardless of RAID mode. |
Much has been written on the various RAID “levels.” However, the word levels is a misnomer–it would be better to refer to RAID modes. Each RAID mode is just a different (not necessarily better) means of operation. (For more information, see the sidebar “RAID levels.”)
You should insist that your RAID vendor support standard RAID levels including 1, 3, and 5. If the vendor has created additional RAID levels that are non-standard (that is, outside the range of RAID 0 through 6), be careful that you aren’t buying into a proprietary architecture. However, certain RAID levels are merely combinations of two other RAID levels. For example, in RAID 1+0 (also called RAID 10), multiple RAID 1 pairs are striped for faster access; and in RAID 15, two RAID 5 arrays are mirrored for added reliability. These combinations offer advantages over single RAID modes and are perfectly acceptable.
Hardware fault tolerance
RAID vendors vary greatly in the degree of fault tolerance they provide. To maximize system availability, you need redundancy in other system components that are most likely to fail, including power supplies and fans. (Disk controllers, unlike all the other mechanical components, are completely electronic and thus are the most reliable components. You can optionally look for redundancy here, too.) To the extent possible, these components should be hot-swappable so that they can be replaced while the system is running, further increasing total system availability.
You should also make sure that each individual enclosure has two AC power cords. Install one into a dedicated UPS and one into a room outlet (preferably a protected outlet) or second UPS. Any RAID disk array with cache memory on the RAID controller should also have a battery backup module for added protection.
Quality construction
Quality construction is an often-overlooked parameter in today’s world of sleek-looking enclosure design. Don’t be fooled by the smooth, rounded corners of plastic disk enclosures: Only all-metal disk carriers offer the thermal conductivity needed for proper heat dissipation for today’s high-speed, higher-capacity disk drives, which often run hot and require proper operating environments. Proper cooling is critical to achieve long and reliable disk-drive life, and metal carriers simply do this better.
Metal carriers also shield the disk drives’ high-speed signals from stray radio frequency interference (RFI) that may occur in an office or computer room environment. For 18GB drives and high-speed 10,000rpm drives, it is also important to minimize drive vibration to avoid excessive disk errors and time-consuming retries. A quality vendor will have a new drive-mounting scheme to ensure reliable operation of 18GB and 10,000rpm disk drives–the vendor won’t just put these more sensitive drives into the same old plastic disk carriers that handled their 1, 2, 4, and 9GB drives.
Performance
Performance is a critical parameter for every server. After you spend all your money on a RAID disk array, is it going to give you the performance you need? Today, at competitive prices, you can get over 9,000 disk read/write operations per second for transaction-oriented applications and up to 35MBbs actual sustained throughput for data transfer operations. If a vendor is not providing anything near these specs, all your applications will run needlessly slowly. Today, computers process data in near zero time; thus, all server applications run in proportion to the speed of the slowest device, which is the mechanical disk drive.
Ask vendors for the performance specs of their RAID disk arrays, and compare them. Don’t accept generalizations. If they are clueless or do not publish their specs, you can be assured that they do not measure up.
Service
You need to ask vendors a number of questions regarding service. Can your people service the unit, or must you rely only on outside service providers? How easy is it to replace disk drives, power supplies, fans, and controllers? Can any moderately skilled technician perform these component replacements? How quickly can the components be replaced? Are they hot-swap replacements–can components be replaced while the system is still delivering data to users who are unaware of the problem? Does the vendor have on-site service available? Do it have an 800 hotline staffed by people who actually know the equipment? Is the service available on a 24×7 basis? Can you or a highly trained factory engineer dial in to your system from a remote site? Can your system automatically alert you via pager in the event of a warning or error condition?
Vendor
|
|
The vendor you select is as important as the product! Is the company merely a distributor of the product that may not know much about it, or does the product come factory direct from the people who know the equipment? Are you considering buying the storage from the server manufacturer because doing so is most convenient, or are you truly looking at your needs and choosing the right RAID disk array? Is the vendor completely committed to RAID disk array technology, or does it sell many other products that dilute its interest and expertise? Is it important to you to have the comfort that comes with a name brand (“no one ever got fired for buying IBM”), or do you just want the best product for your needs from a company that can stand behind the product?
How long has the company been in the business? Does it enable you to be self-sufficient in a crisis or leave you completely dependent? Does it use industry-standard components? Does it offer systems designed to open-systems standards, or has it managed to include proprietary components that lock you in to its architecture? Does it use the industry’s best disk drives or less expensive models with correspondingly lower quality and reliability?
Is the price fair, compared to other vendors’ offerings, or it too high or too low? Do the company’s representatives in sales, sales support, and technical support seem to have the expertise needed to support the product? Do they exhibit genuine interest in providing solutions to your needs? Are they enthusiastic about their products and committed to showing you how much they can be of service–or is their zeal devoted to separating you from your budget as quickly and efficiently as possible?
References
You should ask the vendor for references from other customers. Ask the company’s customers how well they were treated before and after the sale. Did the company meet its promises? Did the product live up to its claims? Were the performance, reliability, and service delivered? How smoothly did the installation process go?
You can be sure that a company will treat you much the same as it treated others. Ask people if they are happy with their purchase and if they would buy from the company again.
Guarantees
When you’re comparing systems, vendors who are willing to guarantee performance in writing are more credible. If a vendor claims to offer a high-performance system that runs fast, see if its will back up its claim. A vendor that believes its claim will gladly guarantee the results you want, because it has confidence (based upon prior experience with other customers) that you will get the performance you’re paying for.
Try to be as specific as possible with respect to your most important application. For example, challenge the vendor to cut your month-end report times by 33 percent, to serve twice as many Web pages from your server, or to cut lengthy database inquiries from five seconds to two seconds. That way, you can test the RAID disk array immediately upon receipt and know right away if you received the advertised benefit. Such a guarantee also builds a case to justify the investment to management in the first place, and communicates clearly to the vendor what you expect in return for your hard-earned cash. It also filters out vendors who know they can’t really deliver. By being specific and getting a guarantee, you simplify the purchasing process and the management justification process, and you increase your odds of success while minimizing risk and conserving your valuable time.
The importance of policies
How can an IT department put a lid on network storage? Organizations such as Houghton Mifflin and Staples have put in place an IT storage policy, which in some cases has become part of a corporate computer usage policy. An IT storage policy can define what is placed on the server, how much space employees are assigned on the server; and what kinds of housekeeping tasks employees will be asked to carry out if they exceed their space allotment.
A new class of storage management tools, called storage resource management (SRM), provides IT departments with the ammunition to justify a policy, to put the policy in place, and to carry it out painlessly. SRM encompasses central detailed monitoring, alerting, reporting, and tending of specific storage resources, such as disk partitions and files, and the data stored on them in a networked system.
In “Storage Management Software Expands Beyond Backup/HSM,” San Jose-based Dataquest Inc. forecasts that SRM products will catapult from $210 million in 1997 to a whopping $1.4 billion by 2002.
Despite preventive tools such as SRM, the need for any type of IT policy can come at the twelfth hour, causing angst among employees. Webster says that IT departments must present this policy using a rationale that employees can tolerate. “They need to understand that managing storage has nothing to do with the cost of the media. Instead, it’s about safeguarding their information and making it readily available to them without interruptions.”
Making a storage policy a bestseller
Lotus Notes servers reaching 95% capacity and the pressure to contain corporate spending for 1999, propelled Mooney to put a Lotus Notes storage policy in place at Houghton Mifflin. “Ten percent of all corporate spending goes for storage–everything from buying equipment to maintaining it. Since we didn’t know what fueled storage growth, we didn’t have a systematic way to budget for storage.”
Meanwhile, systems administrators feared that several Lotus Notes servers could crash if stuffed beyond their limit. Employees complained of the snail’s pace of e-mail via Lotus Notes and the search capability within the Notes database.
A central group of IT systems administrators, assisted by local system administrators, found two consistent usage problems on Sun and IBM servers. First, employees in one division archived documents to Lotus Notes servers, not to the archive server. Second, editors were saving all versions of chapters, both as e-mail attachments and as documents.
Mooney proposed a policy that provided a practical measure for first freeing space on the clogged servers and then containing storage growth on all Notes servers. But, once wind of the policy got out, he says, “Editors I had never met called and said complying with a storage policy would bring them to a grinding halt.”
With the CEO’s support, Mooney organized a steering committee made up of business managers from each division. “We sent all employees an all-hands memo signed by the CEO,” he says. Each department met to discuss the memo. The IT department, with assistance from the business managers, sent files to editorial groups throughout the company. They had to either delete the files, or archive them on a desktop PC. The IT department also met with employees to look at their storage needs.”
Houghton Mifflin’s network storage policy, which became part of a corporate policy for computer and telephone use, gives employees 100MB of e-mail file space in Lotus Notes. If they exceed the space (as tracked by BMC’s Resolve Storage Resource Management suite), the IT department sends them a message. Senior executives who exceed their space get a telephone call from the IT department The policy also outlines a variety of other storage procedures, such as where to store images.
A few weeks after putting the policy in place, Mooney says the capacity on most of the Lotus Notes servers shrank to 70-80%. “We also stayed well within the margin for our budget as a result of not buying more servers.”
To date, Mooney says that about 300 employees have exceeded their space allotment. “Most of these are financial people who work with spreadsheets, as well as a lot of executives.” He says that employees have learned to manage their space because they now realize that storage costs money.
Keeping the NT islands from sinking
During the next three years, the number of Windows NT servers will grow by 85%, according to Farid Neema, a storage consultant with Peripheral Concepts Inc., of Santa Barbara, Calif. He adds that organizations with hundreds, if not thousands, of Windows NT servers will make good candidates for a corporate storage policy and big spenders on SRM tools: “Using these tools, systems administrators can set space limits and unobtrusively monitor at what rate that space is being used; who is using it and how much they are using; who has taken non-company liberties with their space; and what strategies to take to get it under control.”
With Aleutian Islands of disparate Windows NT servers, the IT department at Denver-based Qwest Communications Inc. (formerly U.S. West) prefers to cruise with a few servers before the IT storage policy sets sail across the company to all 50,000 employees. Larry Miner, an IT specialist at Qwest, currently tracks storage usage on 17 servers; each server has about 1,000 employees. Miner’s recommendations will go to a corporate compliance group that will make the final decision on another piece of a corporate storage policy. The company’s policy specifies what types of files (such as music and executable programs) don’t belong on servers, and how long documents can be kept on the server before they must be archived.
Most of Miner’s work consists of determining the best way to set space allocations in a mature Windows NT environment where employees have mountains of data. He says, “We decided to assign space allotments to groups of individuals and monitor that space by their directories. Some groups have may have 250MB per individual, while other groups have 500MB per individual.”
W. Quinn’s Quota Adviser allows Miner to set thresholds on specific space allotments per group by a number of attributes, including files, directories, and disk drives. Whenever someone comes close to exceeding their space allotment, they receive an on-screen, pop-up message from Quota Advisor telling them to remove some documents. Quota Advisor can be set up to padlock the write privileges of employees who exceed their quota and don’t free up space.
To make it easy for employees to manage their space and keep calls to the IT department at a minimum, Miner uses W. Quinn’s Disk Advisor, in conjunction with Quota Adviser. Disk Advisor allows Miner to run a number of real-time canned reports, such as a listing of duplicate files across different directories, or a listing of files with certain extensions, such as .wav. He says, “If someone wants a larger directory space, we can run a report on their disk drive and point them to files they may want to delete or archive. We’ll soon be able to send them the Disk Advisor report as a Web page. This way they can automatically delete the files.”
Miner plans to use another W. Quinn product to put a padlock on the specific types of files that don’t belong on the server.
Meanwhile, large networking outsourcing organizations may soon realize the value-added service in helping their customers carry out a storage policy using SRM tools. For example, EDS Systemhouse (Manitoba, Canada) currently tracks storage usage for a customer with 10,000 employees.
Balancing storage space requires diplomacy
When it comes to storage policies, some organizations adhere to a friendly union of church and state. The corporate storage policy outlines what types of files employees can store on servers and desktops. Likewise, the IT department acts as a watch dog for the corporate policy, and at the same time, carries out its own space usage policy. That’s how Staples, the office superstore, fastens its storage policies. In fact, Staples’ corporate storage policy, which covers Internet use, says that employees must use their server directory for business documents only. As an adjunct to this policy, each IT group in Staples assigns employees in specific departments a flexible amount of personal space on the server. Either, way, the IT departments use SRM tools to clip storage growth before it turns into a paper tiger.
Helen Flanagan, a Windows NT systems administrator at Framingham, Mass.-based Staples Inc.’s corporate office uses HighGround’s Web-based Storage Resource Manager across 75 servers to look at historical space reports by attributes such as servers, directories, partitions and users. Like W. Quinn’s Quota Advisor, Storage Resource Manager allows her to set thresholds on a number of storage attributes, such as disk, partitions, and users; and she receives alerts if those attributes approach or bypass a threshold. She says, “Sometimes people will forget to delete log files or manuals. The alert allows you to talk with them about their storage.”
Flanagan adds that giving all employees the same amount of space seems suited for an environment where a lot of users perform one specific function. But for a growing environment, she says you need to identify and talk with individuals about their storage needs. “Perhaps someone has a different requirement than you initially considered. You have to accommodate any future storage needs they might have. These things are key.”
Locking the doors for space abusers
Even with a corporate storage policy and/or an IT storage policy, systems administrators may need to take out the SRM handcuffs and clamp them on space hogs. John Moeller, a Windows NT administrator at Afga/Bayer Inc., of Charleston, S.C., has worked diligently to curb network storage abuses. At the company, a server crashed when an employee decided to back up an entire desktop database to the server. “When I came back from lunch, the server was down because someone had taken up 100MB of space.”
The company’s corporate computer policy outlines what types of information employees can store. Each IT group decides how it will oversee network storage. Moeller uses Astrum Software’s StorCast SAM, a space monitoring and alerting product similar to W. Quinn’s Quota Advisor, to make sure employees stay within their 40MB allotment. When they hit 20MB, StorCast automatically sends them a message about their space and the need to remove files. StorCast will continue to send notices. Upon reaching the 40MB limit, the employee won’t be able to store any more files on the disk because of the padlock placed by StorCast–the employee must free up space or call the IT department. Although this might sound like a drastic measure, Moeller says, “You can store a lot of spreadsheets and never get beyond 20MB. If you start downloading games, then you’ll quickly fill up your space.”
Network storage policies should travel beyond the limits of space
As you refine your policy, you’ll want to establish backup procedures for mobile employees or enlist an e-storage service for mobile backups. Decide what to do with files when employees leave the organization or transfer to a different department. Also, gather historical data about storage patterns for capacity planning, budgeting, and look at the feasibility of doing storage chargebacks to departments.