Creating an Architecture for Streaming Data
Streaming data technology has become ubiquitous in recent years. Just about everyone from CNN.com to your local cable company uses it. On demand video, music, news broadcasts and security cameras are just a few examples of streaming data.
Since you're likely to run into the technology at some point, it's probably worth a closer look. Let's say you want to set up a streaming server; what are the issues you'll face, and how do you go about determining what to buy and how to configure it?
Once again, we'll begin with the issue of requirements. The requirements for gaining an understanding of streaming data (video or audio) can be complex or simple, depending on a number of factors:
- The number of simultaneous users;
- The required data rate to the users compared to caching — for example, MS Windows Media Player and RealPlayer both cache the stream and do not begin playing until enough data is cached to begin playback;
- Is the data stream real-time, cached or both — real-time data might require less of some system resources and more of others than playing back already captured data, and vice versa;
- Will the playback be sequential — for non-real-time data, this will definitely affect storage and file system performance.
We can divide much of the common streaming applications into two separate and distinct categories:
- High-performance playback and editing, which is used for high-definition streams. This is very uncommon and an example of this type of work is the animation done by the movie industry, especially for animation; and
- Low-resolution playback or capture — the most common type of streaming application, with one of the new areas needed for data capture being security cameras.
High-Performance Streaming Architecture
High-performance streaming is much harder to develop an architecture for since the requirements are much greater. The data rates needed for streaming I/O can exceed 30 MB/sec, and multiple streams are active simultaneously. Shared file systems are often used, which require even more complex architectural analysis since multiple systems are accessing the storage system. A number of shared file systems were actually developed in the late 1990s specifically for editing streaming video.
Even with 4Gb Fibre Channel and high-performance storage, the problem is still complex and requires careful attention to file system allocation and tuning, RAID tuning and a myriad of other data path tuning issues.
Luckily for most sites, requirements such as these are uncommon, and people working on this type of problem have years of experience in this area. These environments are often made even more complex because of the need to have massive amounts of data that is maintained by a hierarchical storage management system (HSM).
If this is your type of environment, you have a great deal of hard work ahead of you to ensure that your environment can meet the performance requirements. Nothing on the storage horizon is going to make it any easier in the near future, given that file systems and storage devices cannot communicate topological information about the location of the data and file system metadata. This can severely limit the performance of the environment, especially if many large files (most systems of this type only write large files) are being written at the same time, given the inherent file system fragmentation.
The only good news for this type of architecture is that it is not that common except in places like Hollywood and other high-resolution visual environments. Personally, I think working on these types of systems is a lot of fun given the complexity of the problems and the large amount of hardware and specialized software needed to meet the operational requirements.
Creating or developing an architecture for lower-performance/resolution environments is much easier for a number of reasons: the data rates are much lower; most of the applications cache the data, so real-time streaming is not that important; and network latency is very high and the relative performance is low.
Lower rates: For these environments, data rates tend to be in kilobytes rather than in megabytes. When the required data rate is three orders of magnitude less, this simplifies the architecture enormously.
Applications: The applications in these environments are often products like Windows Media Player and RealPlayer. These applications measure the income data rate and cache the data before beginning playback. If the network performance changes, these products stop and wait for the data rate to catch up and begin playback again.
Network latency and performance: In most of these environments, it is all about network latency and bandwidth. If you are streaming video from sites such as Yahoo, CNN and the like, the local bandwidth and latency at these sites is far greater than what you have at home with high-speed cable. The fastest common home network connection I have seen is 5 Mbits/sec. This far exceeds the rate needed to play most visual video streams, but the latency between you and the video stream can be high, especially if what you want to look at is a hot news or sports item. The latency is caused by contention for the video stream and a lack of bandwidth to the outside world from where the data is stored.
Choosing the Right Architecture
Obviously, since high- and low-performance streaming environments are so different, different architectures are needed for each. Using an HPC architecture for a low-performance environment would be overkill, unless the low-performance architecture was so large that an HPC architecture might make sense.
For the high-performance architecture, I would be looking at a 4Gb Fibre Channel environment, high-speed RAID controllers, PCI Express-based HBAs, a server with a great deal of memory bandwidth, and a-high performance file system with HSM capabilities.
A number of vendors develop solutions specifically for these types of environments. In the 1990s, both Apple and SGI dominated this market segment, but their domination has waned given the commoditization of everything. Some companies will still pay big bucks to ensure that they can meet their requirements because not getting a movie out on time can cost a lot of money.
Today, companies such as Quantum/ADIC, Sun and a myriad of others provide solutions in this market space. It's not a big market, but it is prestigious.
For low performance needs, the first thing I would consider is NAS. Since the problem is all about low-resolution video data, NAS is often the best solution because it is easy to use, configure, maintain and manage. A number of vendors have optimized systems for exactly this type of application, such as Isilon, NetApp and others.
Managing content delivery of streaming data is not that difficult, given the latency, applications and file sizes, because the applications have addressed the network latency. Streaming I/O is not that big a deal over the Internet, and is often a function not of the storage system, but the interface to the Internet. Most NAS devices can handle the requirement without much architectural work, and this is true even for delivery of content within an intranet. The streams just are not that demanding of bandwidth and can be addressed by NAS technology.
On the other hand, editing and creating content is not that easy and requires careful attention to architectural planning and usually high-performance hardware and file systems. All of this might change over time as hardware gets faster and software gets more efficient, but for the time being, that's the way it is.
Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 26 years experience in high-performance computing and storage.
See more articles by Henry Newman.