Bringing Clarity to Cloud Storage
Virtualization and cloud computing do a fine job of increasing efficiency by abstracting the underlying physical resources. Abstracting those resources, however, can make it a nightmare to do capacity planning or determine where a slowdown is occurring.
Just as an airline pilot needs an instrument panel to safely land in the fog, storage administrators need the right tools to monitor their virtual and cloud storage. Not all storage management products provide the needed insight.
“One of the key problems in virtual environments is the ‘many-to-many’ nature of the virtual abstractions on both server and storage side,” says David Wagner, Business Development Principal for TeamQuest Corporation.
“This leads to a huge problem in identifying what specific storage element is actually associated with any given VM (and when), complicating planning and performance trouble shooting. When one factors in that VM's are constantly changing or moving (e.g. DRS, VMotion, etc.) and that the actual physical LUNs that are members of a pool or volume are also changing or changeable, the manual approach becomes virtually unworkable.”
While there are numerous management tools that look at the servers or look at the storage, in virtual and cloud environments you need tools that can look at both and see how they interact.
To address this issue, TeamQuest has come up with its Storage Capacity Management (SCM) solution to provide a single panel look into the performance and capacity of both SAN storage and other related parts of the infrastructure, including the application, the virtual server, the physical server and the storage system. This approach makes it easier to analyze, troubleshoot, manage and optimize virtualized cloud environments.
SCM currently supports IBM and EMC storage systems, with Hitachi support under development. “Because we have detailed data collection on both sides of the Server - Storage 'divide' now,” says Wagner, “we have the ability to automate analysis, as well as allow ad-hoc drill-down type analysis of performance and capacity across both, in the context of either Workload/VM/Service performance on the server side or Storage (pool, volume, LUN, etc.) performance.”
The TeamQuest SCM solution is based on two components – TeamQuest Surveyor automated capacity analysis software and TeamQuest CMIS (Capacity Management Information System). It uses the recently released TeamQuest Performance Indicator (TPI) – a 1-100 scale that presents a simple view of the health of computer systems based on their queuing behavior.
SCM calculates service versus queuing time (by workload, system, VM) for every server/workload/VM and every interval, automatically determining where and when any/all queuing (wait time) for resources (CPU, IO) is occurring. Whenever significant wait times are found, it automatically conducts further drill down analysis.
That drill down analysis includes looking at queuing on IO, and then looking to see if it is 'server-side' or not. For example, if there is IO queuing during a specific time period and there is no corresponding peak in the IOs being done by the server or VM, then the waiting or queuing must be occurring off the box. In that case, the SCM automatically drills down into the detailed SAN storage performance data and looks to see if it is front-end versus back end. It does this in the context of the workload/server/VM that is queuing on IO, and at the specific time that the queuing is occurring.
The process can also go in reverse – from the storage back up the line to determine which servers, VMs, workloads or services are being affected. “Other solutions that only do storage, or only do server, don't have TPI to auto-detect any or all queuing, but also don't have any contextual ability to do analysis back and forth,” says Wagner.
“Without SCM, they have to assemble ad-hoc ‘Tiger Teams’ to bring together their storage folks and their application and server folks, and both must manually -- using data extracted from the separate silo-tools -- try and correlate the data to determine potential causal linkages.”
Integration with other Tools
There are a wide range of storage-only management tools out there but few if any model the server side of the stack. If a user already has one of these tools specific to their SAN, they can use a TeamQuest User Agent to data mine performance data from other SAN tools and bring it into context with server performance and capacity data. Depending on the tool in use, the data might not be as rich as that provided by a full SCM solution.
“Because Surveyor's groups dynamically track changing membership,” says Wagner, “users can not only understand the linkage between VMs and specific storage, but continually track that over time and configuration change.”