The Data Virtualization Market in 2022

Enterprise Storage Forum content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Data virtualization is an approach to data management that focuses on creating virtual structures for ease of data retrieval and manipulation. It enables users to extract virtualized data through dashboards, portals, reports, and apps.

Data can be accessed and manipulated without its technical details or physical locations as an overall view to the user.

See below to learn all about the global data virtualization segment:

See examples of How Virtualization is Used by Nasdaq, Bowmicro, Nilkamal, Isala, University of Pisa, and AeC: Case Studies.

Data virtualization market

The data virtualization market was estimated at $2.3 billion in 2020. It’s expected to maintain a compound annual growth rate (CAGR) of 17% over the forecast period from 2020 to 2027, reaching $7.2 billion by the end of it.

The stand-alone software segment of the market is expected to reach $1.4 billion by 2027, with a CAGR of 17%. The application tool segment was estimated at $429 million in 2020 and is expected to reach $1.2 billion by 2027, at a CAGR of 15%.

Regionally, the data virtualization market is divided as follows:

  • The U.S. market was estimated at $680 million in 2020, with a 29% share
  • The Chinese market is forecast to maintain a CAGR of 17%, reaching $1.3 billion in 2027, with an 18% share
  • Japan and Canada are each forecast to grow at a 16% and 15% CAGR over the period 2020 to 2027
  • Within Europe, Germany has one of the highest estimated CAGRs at 12%
  • The Asia-Pacific market, led by Australia, South Korea, and India, is forecast to reach $838 million by 2027

By industry vertical, the medical care sector is forecast to have one of the highest growth rates by 2023.

Other notable verticals include:

  • Banking, financial services, and insurance (BFSI)
  • IT
  • Telecommunications
  • Manufacturing
  • Government

Data virtualization solutions enable users — often employees and in-house data scientists — access to varying amounts of data through a common platform in any format they prefer, regardless of its storage location on the server. 

With the increasing use of digitization, Internet of Things (IoT) networks, connected devices, and the use of data generation, the total amount of data produced by a single organization has also drastically increased. 

See all about the Storage Virtualization Market.

Data virtualization features

Data virtualization can be executed in a wide variety of forms that are optimal for use in different circumstances:

Data blending

Data blending is the combining of data from two or more sources into a single dataset for access, analysis, and visualization. It collects additional information from secondary data sources along with the primary data source in the unified view. There are multiple sets of data blending, such as relationships, blends, and joins that vary in efficiency, complexity, and flexibility.

SQL virtualization

Data virtualization through SQL servers doesn’t require the data to change its original location. It is virtualized in an SQL server instance, minimizing the need for extract, transform, and load (ETL) processing, and the data can still be queried in the SQL server. Often used with big data, the data can be combined with SQL files, making it available through standard SQL queries.

Data services module

Data services modules are services included with data integration suites and warehouses. They’re software functions that assign characteristics to data that it doesn’t already have. Data can be transformed into becoming more available, resilient, and comprehensible for applications and users. 

Cloud data services

Cloud data services are the cloud-based alternative to local database virtualization. Those services are included in data virtualization solutions as SaaS packages along with on-premises tools used to access and manipulate data. While not considered true data virtualization, cloud data services allow users to maintain compatibility with various cloud platforms and provide a wide selection of analytics tools and services.

“Because the data created by these sources is stored in diverse formats at numerous physical places, getting quick access to it has become a difficult task. … As a result, the requirement to manage and integrate data from several sources has become critical in order to achieve real-time data availability,” writes Verified Market Research.

“Furthermore, data virtualization solutions provide advantages, such as better data management, faster time to market, and higher data quality. … Most organizations in the data virtualization market are focusing on cloud-based solutions to lower the infrastructure costs associated with managing massive amounts of data.”

Data virtualization benefits

In data-centric environments, data virtualization can have positive effects on IT departments and data-based applications used on-premises as well as in the cloud.

A few notable benefits of data virtualization include:

  • Facilitates real-time data access
  • Enables data services provisioning
  • Doesn’t require data replication
  • Reduces cost for IT department and infrastructure
  • Accelerates value for business applications
  • Provides better insights
  • Improves data management efficiency

“Data virtualization addresses the data movement challenge by ensuring data remains at the source — yet is also available for consumption in real-time for consuming applications,” says Manish Mehndiratta, a member of the Forbes Technology Council.

“It goes beyond tiered views and delegable query execution to offer enterprise growth. Overall, implementing your own data virtualization approach will let you derive information faster.”

Data virtualization use cases

As a virtualization technique used for simplifying data management and accessibility, data virtualization is flexible and used by organizations in different industries:

Highmark Health

Highmark Health is a nonprofit national health and wellness organization based in Pittsburgh, Pennsylvania. It also operates several for-profit subsidiaries.

Highmark Health is partially responsible for handling the country’s sepsis cases, averaging just over 700,000 annually, with mortality rates between 25% and 50%. Working with IBM and the Geisinger Health System, Highmark was looking to use data in sepsis prediction.

In combination with IBM Watson AI, IBM Data Virtualization, and IBM Watson Studio, Highmark Health was able to identify patients at a higher risk of sepsis and prioritize them for urgent care.

“The (IBM) data science elite team wanted to show me that this was possible and that I could tell our stakeholders across the company that we were going to have this model ready to deploy and ready to go into the clinical systems,” says Curren Katz, director of data science, Highmark Health.

“We wanted our care managers, nurses, and doctors to be able to access the findings and incorporate that into their work and reach out to patients. I think it was within a couple of days that IBM came back with a deployed model, and I was kind of shocked.”

With IBM’s AI and data virtualization solutions, Highmark was able to eliminate the use of data silos and cut AI development time from a year to six weeks.

Indiana University

Indiana University has eight locations, a faculty count of 19,000 and a student body of over 114,000.

Indiana University needed to obtain timely data for decision-making. The university sought to reinvent its data management and storage in a way that makes data more accessible to faculty members and students.

Deciding on Denodo, Indiana University started its Decision Support Initiative (DSI) program through Academic Metrics 360 (AM360), using data virtualization to obtain a 360-degree view of the academic center. 

“We really wanted to try to focus our business intelligence efforts and data development work around the idea of Agile BI. We wanted to try to iteratively deliver value to the university,” says Dan Young, chief data architect, Indiana University.

“We decided that Denodo was going to be a good fit for us as we were trying to move forward in the Agile BI methodology.”

With Denodo, Indiana University built a logical data warehouse and facilitated access to data and information in a timely manner.

Anadolu Sigorta

Anadolu Sigorta is one of Turkey’s largest insurance companies for health, engineering, marine, automotive, fire, and home. With over three million customers, it provides access to its services digitally and at high speeds.

Anadolu Sigorta need to solve storage challenges and shortages that led to problems when creating data requests. Working with Delphix, it was able to virtualize its data systems and solve its 5 TB storage shortage.

“With Delphix, we are now able to provide our test and solution development teams more accurate data with less time and less resources,” says Mehmet Abaci, CIO, Anadolu Sigorta.

“Also, thanks to Accuras professional services team, as they have always been there for us at every step of the process for consultancy, architectural design, and installation support, while enabling us to take advantage of Delphix in the shortest and most efficient way.”

In total, Anadolu was able to save over 250 TB in storage and cut its database provisioning time from five days to 10 minutes.

Data virtualization providers

Some of the leading providers of data virtualization services include:

  • Information Builders
  • Microsoft
  • OpenLink Software
  • IBM
  • Datometry
  • SAP
  • SAS Institute
  • Red Hat
  • Oracle
  • Denodo Technologies

Looking for a server virtualization product?: VMware: vSphere Review.

Anina Ot
Anina Ot
Anina Ot is a contributor to Enterprise Storage Forum and Datamation. She worked in online tech support before becoming a technology writer, and has authored more than 400 articles about cybersecurity, privacy, cloud computing, data science, and other topics. Anina is a digital nomad currently based in Turkey.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends, and analysis.

Latest Articles

15 Software Defined Storage Best Practices

Software Defined Storage (SDS) enables the use of commodity storage hardware. Learn 15 best practices for SDS implementation.

What is Fibre Channel over Ethernet (FCoE)?

Fibre Channel Over Ethernet (FCoE) is the encapsulation and transmission of Fibre Channel (FC) frames over enhanced Ethernet networks, combining the advantages of Ethernet...

9 Types of Computer Memory Defined (With Use Cases)

Computer memory is a term for all of the types of data storage technology that a computer may use. Learn more about the X types of computer memory.