No, it is not time for my yearly predictions (that happens next month), but this month, leading up to my yearly predictions, I want to look at innovation in general in data storage. Given applications today and future applications, does innovation really matter, or is it in fact required for success?
Here are some of the challenges. At the basic technology level we have the interface, which is SATA vs. SAS. But a more fundamental challenge is this: big companies with lots of products and support compared with smaller, new, more flexible companies with innovative products and ideas.
I think the SATA vs. SAS is a precursor of things to come. What seems to be left out of the vendor and product equation is the application. As usual I will do my best not to make product value judgments (good or bad). But if I need to refer, I will refer to vendors to talk about the old guard vs. the new vendors trying to take over, reporting on what happened in the past.
So back to the question: is there innovation?
Innovation: SAS vs. SATA?
We can look at innovation from a number of places on the stack, but since I really like to start at the device and work my way back to the application, I will start with the disk drive interface. With that in mind let’s start with SAS vs. SATA.
It is not arguable that SAS is a more robust protocol that allows more error management at the drive, and that SAS is a more efficient interface and historically was a more expensive hard drive interface. These are historical facts. And the cost factor has not been the case for the last few years with the advent of dual interface nearline SAS/SATA drives on enterprise 3.5 inch drives.
SAS is – and has been for a number of years – the drive interface choice for the enterprise, replacing fibre channel. In the past, enterprise drives generally cost more, and are built to perform at moderate capacity but high performance. SATA was built for high capacity but had lower reliability in many areas of the protocol.
This all changed in about 2009 when vendors came out with drives with a single ASIC that supported both SAS and SATA and marked the end of the fibre channel drive interface era. SAS currently supports 12 Gbits/sec per lane, while the SATA interface still only supports 6 Gbits/sec per lane. Of course, for a single drive disk it does not matter, nor for even about 6 drive 4 TB drives. But for anything after that, 12 Gbits/sec makes a difference.
So for your home, 6 Gbits/sec vs. 12 Gbits/sec makes no difference for disk drives and likely the same for SSDs. But it does make a big difference for the enterprise as it reduces the number of chipsets and connectivity on the backend of a large storage controller, and reduces the complexity of the design, e.g. it reduces cost.
It’s not about speed of the interface, but how much more SAS bandwidth one is able to utilize compared to SATA. The latest offerings of 12 Gbits/sec technology definitely provide more lanes and more bandwidth for its users and less complexity for storage design.
The current plans for SATA is to move to 8 Gbit/sec sometime next year. It will be interesting to see how many disk drive vendors move in this direction given the performance of SAS will be 50% greater. And with 24 Gbit/sec SAS on the horizon (disk connectors supposedly available in 1Q14) I do not see why anyone would use SATA in the near future.
One of the things that still puzzles me is that Intel CPUs, which at one time were supposed to have SAS built into the CPU, still have SATA. This is going to be a problem in the future as SAS will dominate the enterprise and SATA is going to be relegated to the low end of the market. Besides the performance in error control, you cannot support things like T10 Protection Information (PI)/Data Integrity Field (DIF) in SATA, and never will.
Conclusion: SATA is a dying technology for the enterprise and will not be used in the future. Anyone using it or claiming SATA is for the enterprise should be told – clearly – it is not.
Innovation: Old Guard New Guard?
The traditional enterprise storage companies, EMC, HDS, IBM, NetApp and the rest, are being challenged by new players like Caringo, Cleversafe, Crossroads, Kaminario, Nimbus Data, PureStorage, SanDisk SpectraLogic, Violin Memory and others – vendors with both hardware only and embedded hardware/software solutions.
Most of the new vendors are developing products in either one of two areas. These are either storage tiering or storage interfaces or both.
Storage tiering has been around for a long time. I remember back in the 2002, there was a company named Cereva, whose assets were purchased by EMC. They had a product that places heavily used data on the outer cylinders of the disk and less used data on the inner cylinders. On paper this made sense, while in practice (because of disk errors and remapping of blocks) it did not make much sense.
Fast forward a decade later and with flash drives and flash PCIe cards, vendors are building caching products that move blocks based on usage to flash devices. Sounds like a great idea for applications that are read intensive, but how do these products fit into a world where Seagate has moved the same technology to the disk drive? Who is going to own that business, Seagate or the new caching controller vendors? Does the world need an all-flash storage device for everything, or does the product just solve an important problem for some environments, but not a broad market important problem? Lots of questions. But the answer depends, I think, on the application.
Moving up the stack
Storage interfaces are changing also. The REST interface is getting lots of market traction and the fact that Seagate has announced a REST-based product codifies this change in the market.
I do not think that REST solves all of the world’s data storage problems. And I have ranted enough over the years about the lack of innovation and improvement that have happened to the POSIX standard. But POSIX interfaces for applications still solve problems that REST cannot.
For example, say you have a large dataset that you need to process. With REST you cannot start operating on it until the whole file had been transferred, while with POSIX you can start reading the file within your application.
REST and POSIX interfaces for storage solve two different problems. You do not want to fix your toilet with a hammer nor pound nails with a pipe wrench, and up until REST came on the scene there was only one tool in the toolbox. Now we have two but do we need more?
I think the answer is likely yes, and a good example might be data analysis applications, which are going to need ways of ingesting data in real-time to allow decisions to be made in near real-time. Neither REST nor POSIX do this well, but is anyone working on this?
EMC helped start the world down the object path with the Centera product, but the open source world caught up to EMC. And it is very difficult to have a closed system software with just APIs compete with an open source software.
Though it looks like the FAA is going to delay UAVs flying in the USA until after 2015, that does not mean that the amount of video collected per day is not in the petabytes. And this will likely drive either a new interface or modification to an existing interface to better support the capture of that type of data.
I seriously doubt that POSIX will be modified so this leaves change to REST interface to better support video. Which brings me to my point: the newer vendors are seemingly ready and able to use open source technology, while the old guard is slower to catch on with using these technologies.
Maybe it’s because the new vendors do not have the development staff and cannot have a Not Invented Here (NIH) attitude, or maybe it is because they are closer to the market and can see the innovation coming. Either way the new guard has always been more innovative than the old guard. Very few of the new guard grow up and become the old guard. NetApp might be one of the few examples over the last decade or so.
Data Storage and Change
The old guard is not without tricks up its collective sleeves. In particular, buyouts. If you go back to the 1990s during the dotcom boom lots of the new guard companies got brought out and a number of them did not. Those companies that did not are most often not with us today. Many have gone bankrupt.
Historically, the old guard does not innovate, with few exceptions. But when the old guard does innovate it often falls short, given market perception. And NIH concerns about being locked into a single vendor solution surface, no matter how good that solution is.
The storage landscape is being pushed to change from two directions. The last few years we have heard about how flash is changing everything, and flash will replace spinning disk. This clearly will not be happening, yet there is a need for a high IOPS tier that is faster than 15K RPM 2.5 inch drives. What happens to software and hardware from the new guard now that Seagate has developed and is marketing an enterprise drive with flash right in the drive?
This Seagate technology enables the old guard to solve problems that they could not solve before, without making changes. The new guard sometimes does not have many of the enterprise features that make something really enterprise. Things like detailed SMART management and proactive error management T10 PI/DIF, replication, snapshots and other similar features.
Intel, with the move to putting the communications on the CPU, and Seagate with putting flash on the drive and building REST interface drives, both seem to be changing what the storage future might be. And seem to be trying to pressure the traditional players into adopting their new worldview.
In the 1990s, the old guard simply just purchased what they wanted and needed to be successful with Fibre channel, new file systems, and interfaces. The old guard is doing the same today. Yet the differences are significant and striking. They’re something that may or may not result in the same outcomes this time as the market dynamics are changing. Note to self: potential big changes ahead.
Photo courtesy of Shutterstock.