Troubleshooting SAN Problems
In a claim that sounds like something out of an H.G. Wells novel, Virtual Instruments says it has addressed SAN invisibility.
"A SAN is basically a cloud that you can't see inside," said Mark Urdahl, president and CEO of Virtual Instruments, of Scotts Valley, Calif. "When you add in virtualization, you have a further layer of abstraction and complexity."
This leads to much wasted time, as storage administrators must hunt around trying to find out what the problem is. That's why so many organizations tend to overbuild their hardware infrastructure to compensate for a lack of prediction.
"Our monitoring software helps you to see what is going on inside the SAN," said Urdahl.
As a company, Virtual Instruments was carved out of Finisar Corp. Its NetWisdom software already existed, but Finisar never really marketed it, said Urdahl.
Traditional storage resource management (SRM) goes only so far, he said. It optimizes the efficiency and speed with which drive space is utilized in a SAN. It adds automation to functions like data collection and storage, provisioning, forecasting of future needs, and maintenance of activity logs.
But while SRM products provide some management capabilities, they don't really offer the level of diagnostic sophistication needed to prevent outages and slowdowns. NetWisdom gives such a view, said Urdahl, dealing as it does with the monitoring of I/O performance, bandwidth utilization, as well as average I/O completion times. It also verifies that changes in hardware and configuration do not adversely affect key applications.
Urdahl reports that 95 percent of the company's business is SAN monitoring. Most customers have storage volumes in the ranges of 100TB to 5 PB, although some go as low as 40TB. Typically, these environments have massive data growth coupled with a reduced headcount. Tools that take the time out of management and troubleshooting, therefore, are in high demand.
Banking on NetWisdom
A global financial institution is a user of NetWisdom in its North American operations. It has four data centers (three on the Eastern Seaboard and one in the West) and requires high volumes of I/O processing.
"Our two data warehouses can tax our storage subsystem at a rate of 40,000 to 50,000 IOPS for hours on end," said Ryan Perkowski, SAN manager at the financial giant. "Currently, we have has 420 TB of storage an amount which doubles every 11 months."
The organization uses EMC (NYSE: EMC) storage arrays (Symmetrix DMX, Clariion and Centera) along with a combination of Cisco (NASDAQ: CSCO) and Brocade (NASDAQ: BRCD) switches and a large mainframe for transactions. Its server population is 60 percent AIX, 20 percent non-virtualized Windows, 15 percent virtualized Windows and Linux, and 5 percent of various other complexions. A virtual machine rollout is ongoing.
"We were suffering from over-subscription on our Cisco switches due to the heavy demand for throughput," said Perkowski. "Cisco offered no tool to look at the throughput of the SAN."
Server latency was another issue, as well as demands for better performance on Oracle. Perkowski explained that a slow query on Oracle has a ripple effect across the IT infrastructure: Users tend to get fed up waiting and re-query, which only doubles the length of the queue and adds to further Oracle delays.
The company attempted to gain greater insight into its SAN-related slowdowns using Symmetrix Data Remote Facility (SDRF). But that didn't deliver the required information. Solution: throw lots of hardware at the problem. Perkowski said a steady stream of additional host bus adapters (HBAs), switches and servers failed to offer complete relief from throughput constraints.
"As we had no tool to look at throughput through the SAN, we didn't know the underlying cause," said Perkowski. "As my applications were expanding, they were starting to outgrow the hardware. We just didn't know where to split things off."
In a previous life, Perkowski worked for Finisar, so he knew about NetWisdom. His argument for the product is that the network has a sniffer that lets you know the latency for the round trip time. All storage has is the MB/sec rate, which he felt was inadequate for accurate diagnosis and troubleshooting.
"NetWisdom opens up fabric blindness," said Perkowski. "It helps us to maximize virtual hardware loads, ensure peak application performance, verifies vendor marketing claims and gives us unbiased answers to cut through vendor finger pointing."
He uses the Virtual Instruments product in several ways beyond performance tuning. IT, for example, was reticent about loading more than one mission-critical application on its AIX servers. When the company virtualized its AIX infrastructure, NetWisdom data gave Perkowski enough confidence to include two heavy hitter databases on the same AIX box.
In addition, if a database has slowed down in the previous week, IT can now look back at historical data and find the cause. A database optimization project, which took 140 hours before, can now be done in eight hours. Further, the company had the green light to spend another $1.5 million on a storage hardware upgrade. NetWisdom found the underlying cause of the slowdown, which meant that the disk array purchase could be postponed.
"NetWisdom is the single most useful tool in my SAN tool bag and is the most used tool by my team," said Perkowski. "It has saved us countless hours of troubleshooting and given our whole department a new direction in improving performance."
Article courtesy of Enterprise IT Planet
Follow Enterprise Storage Forum on Twitter