Object Storage vs. File Storage
Recently, we’ve published a couple of stories on Object Storage, including an Object Storage Buying Guide and Top Tips for Object Storage. While these articles explained what object storage was and gave users guidance for the selection process, there still remain questions about when object storage should be chosen – and when it might be best to stick with good old file storage.
Scalability is the rallying cry of object-storage advocates. Shahbaz Ali, President and CEO of Tarmin, for example, considers that data-intensive organizations that are dealing with petabyte levels of capacities and continuous accelerated annual data growth rates should opt for object storage.
“The linear storage approach offered by object-based storage meets these massive ongoing scalability requirements,” said Ali.
In addition, as modern organizations become more geographically dispersed, demand grows for mobile workforce data access. As many of these users access through applications that typically use an HTTP REST API (an internet protocol optimized for online applications), this makes object-based technologies better for online and cloud environments with a heavy mobile footprint.
On the other hand, Ali conceded that a file-based storage solution would work well for enterprises that are not geographically dispersed, that support many file types and that utilize applications that generate smaller capacities.
How Object Storage Differs from File Storage
Mark Goros, CEO and Co-founder, Caringo, took the time to explain in more detail the differences between file and object. In file storage, when a file is stored to disk it is split into thousands of pieces, each with its own address. When that file is needed the user enters the server name, directory and file name, the file system finds all of its pieces and reassembles them. Little if any metadata information about the file is included.
In object storage, that same file is stored as a single object complete with metadata, and is assigned an ID and stored as close to contiguously as possible. When content is needed all a user needs to do is present the ID to the system and the content will be fetched along with all the metadata, which can include security, authentication, etc. This can happen directly over the Web, eliminating the need for Web servers and load balancers.
Goros provided a valet parking analogy. With object storage, you drop off your car, and when you want it back all you need to do is present a ticket. You don’t care where or how your car is parked just as long as you get it back quickly in good condition. The focus is on fast access and convenience.
“The key differences are massive scale (10s to 100s of petabytes and billions of objects) and direct access over HTTP,” said Goros.
Massive scale is enabled by the lack of a traditional file system, automation, continuous protection using replication and/or erasure coding. While file systems must be backed up, added Goros, object storage never needs backup because it is said to guarantee the integrity and availability of data.
“File systems need to be managed, rebuilt and maintained while object systems do this automatically,” said Goros. “File systems run out of iNodes, and object systems have no iNodes to run out of. Further, file systems slow down as they get filled up.
OK, so does he cut file-based storage any slack? He admitted that special purpose file storage does transaction processing well as it is built specifically for this use case. Examples include airline reservation systems, banking transaction systems and stock trading systems.
“If an organization has legacy applications that need traditional storage protocols and have low latency requirements, file-based storage may still be the best option,” said Goros. “There will be scalability limits and data protection limits to deal with.
John Dickinson, Director of Technology, SwiftStack, chose to lay out the differences between object and file storage by discussing the history of files. The computer first moved in to the existing world of literal desktops and filing cabinets, and those ways of organizing data became the prevailing metaphors by which we organized data with computers.
“Storing data in files and folders are great ways to think about data when you are only using one computer in an office, and so we built ‘file systems’ that used ‘folders’ to organize our data,” sad Dickinson. “However, technology has progressed, and we no longer use just one device, and we now expect that our data is shared and accessible across all our devices.”
That’s why, he said, the old approach of data as ‘documents’ stored in hanging file ‘folders’ breaks down when there is a need to have simultaneous access to data across many devices. Object storage, then, was designed to get rid of existing data organization paradigms like putting one folder in another or allowing only a single person to edit a file at a time. It treats everything as a piece of data, coequal with every other piece of data. This allows object storage systems to store and share data across millions of computers.
“With object storage, applications don’t have to spend time worrying about how to deal with file locking or hardware failures or capacity management,” said Dickinson. “Instead, applications can focus on their own functionality and treat the storage as a utility resource: send data to be saved, ask for data back later.”
Like other experts, he agreed that there are still use cases for file storage. For example, it’s good for small-scale local storage that needs to be exceptionally fast. Object storage, said Dickinson, generally doesn’t work too well in that situation because the object storage cluster is on the other side of a network connection.
Frequent Changes: File Storage Top Object Storage
Another area of strength for file-based storage is frequently changing data. Lance Broell, Product Marketing Manager, Object Storage, Data Direct Networks, suggested that those seeking to understand the real differences between both types of storage take a 30,000 foot view. If you do so, he said, you can see that file-based storage is well suited for data that changes frequently or may require concurrent access.
“Object storage is architected for distributed data with a low rate of change and is designed to scale both the number of files (objects) and storage capacity far beyond what is possible with file-based systems,” said Broell. “It’s ideally suited for massive storage of unstructured data, file sync and share, content distribution, backup and recovery or archival data.
Photo courtesy of Shutterstock.