Cache memory plays a key role in computing and data processing. After all, it’s a critical component all modern computer systems use to store data for fast and easy access. Everything, from desktop PCs and data centers to cloud-based computing resources, use a fast static random-access memory (SRAM), also called a cache memory, that works alongside the central processing unit (CPU).
While the fast performance of a computer is oftentimes credited to its RAM power or processor, cache memory has a huge and direct impact on the overall performance quality of the device.
This article will help construct a deeper understanding of how cache memory works, its various types, and why it’s critical for the smooth operation of a computer system.
How Cache Memory Works
Some computers include an SSD cache memory, often called flash memory caching. It’s used to temporarily store data until permanent storage methods are able to handle it, which boosts your device’s performance.
However, since the CPU is one component responsible for pulling and processing information and it works considerably faster than the average RAM, users may be forced to wait while it attempts to read incoming instructions from the RAM. This results in reduced performance and speed.
Cache memory is the solution to prevent this from happening. This is accomplished by installing a small-capacity but fast SRAM close to the CPU.
The SRAM then takes data or instructions at certain memory addresses in RAM and copies it into the cache memory temporarily, along with a record of the original address of those instructions or data. As a result, this prevents the CPU from having to wait, which is why caching is used to increase read performance.
However, since the cache memory of any given device is small in relation to its RAM and CPU computing power, it can’t always hold all of the necessary data. Depending on which scenario ends up happening, it could result in what’s referred to as a “cache hit” or a “cache miss.”
Memory Cache Hit
When the CPU goes to read instructions off of the cache memory and finds the corresponding information, this is known as a cache hit. Since the cache memory is faster and closer in proximity to the CPU, it ends up being the one to provide the data and instructions to the CPU, allowing it to begin processing.
In the event of a cache hit, the cache memory acts as a high-speed intermediary and queue between the CPU and the main RAM. Processes that require data or instructions to be written back to the RAM or memory must first go through the cache memory until the RAM is able to reach. That way, the CPU isn’t slowed down waiting for the RAM’s response.
There are multiple ways this correspondence occurs depending on the cache memory’s write policy. One policy is known as “write-through.” This is the simplest and most straightforward approach, in which anything written on the cache memory is also written on the RAM.
A “write-back” policy allows data written to cache memory to be immediately written to the RAM, and anything written to the cache memory is marked as “dirty” for its duration.
This signals that the data is different from the original data or instruction pulled from the RAM. Only when it is removed from the cache memory will it then be written to RAM, replacing the original information.
Some intermediate cache memory write policies allow “dirty” information to be queued up and written back to the main RAM in batches. This is a more efficient approach compared to using multiple individual writes.
Memory Cache Miss
If the CPU goes to read information or instructions on the cache memory but is unable to find the required data and has to resort directly to the hard drive and RAM, this is known as a cache miss. This reduces the speed and efficiency of the device’s processing, as it now has to operate according to the speeds of the RAM and hard drive.
Afterward, when the required information or instructions are successfully retrieved from the RAM, they first get written to the cache memory before they’re sent off to be processed by the CPU.
That happens primarily because data or instructions that have been recently used by the CPU, are likely to still be of importance and need to be accessed again shortly. Writing it to the cache memory saves the CPU from having to return to the RAM or hard drive a second time to retrieve the same data.
On rare occasions, some data types can be marked as non-cacheable. This is to prevent valuable cache memory space from being occupied by unnecessary data, even if it has been retrieved manually by the CPU from the RAM or hard drive.
Cache Memory ‘Eviction’
A cache memory’s storage capacity is minuscule compared to RAM and hard drives. While RAMs can range between 2GB and 64GB and hard drives reach 1TB to 2TB on average consumer devices, a cache memory’s capacity measures between 2KB and a few megabytes.
This stark difference in storage capacity would mean that sometimes cache memories get full when the CPU still needs to pull in information. “Eviction” is a process that removes data from the cache memory to free up space for information that needs to be written there.
What data is going to be evicted is determined through a “replacement policy” depending on the most in-use and important information.
There are a number of possible replacement policies. One of the most common ones is a least recently used (LRU) policy. Based on this policy, if data or instructions have not been used recently, then they are less likely to be required in the immediate future than data or instructions that have been required more recently.
2 Types of Cache Memory
Cache memory is divided into two categories depending on its physical location and proximity to the device’s CPU.
- Primary Cache Memory: The primary cache memory, also known as the main cache memory, is the SRAM located on the same die as the CPU, which is as close as it can be installed. This is the type generally used in the storage and retrieval of information between the CPU and the RAM.
- Secondary Cache Memory: The secondary cache memory is the same hardware as the primary cache memory. However, it’s placed further away from the CPU, ensuring the existence of a backup SRAM that can be reached by the CPU whenever needed.
Despite being accurate terms to describe a cache memory based on its physical location in the system, neither is used nowadays. This is because modern cache memories can be manufactured small enough and with sufficient capacities to be placed on the same die as the CPU with no issue. Instead, modern cache memories are referred to by level.
3 Level of Cache Memory
Modern computer systems have more than one piece of cache memory, and these caches vary in size and proximity to the processor cores and, therefore, in speed. These are known as cache levels.
The smallest and fastest cache memory is known as Level 1 cache, or L1 cache, and the next is the L2 cache, then L3. Most systems now have an L3 cache. Since the introduction of its Skylake chips, Intel has added L4 cache memory to some of its processors as well. However, it’s not as common.
Level 1 Cache
Level 1 cache is the fastest type of cache memory since it’s embedded directly into the CPU itself, but for that same reason, it’s highly restricted in size. It runs at the same clock speed as the CPU, making it an excellent buffer for the RAM when requesting and storing information and instructions.
L1 cache tends to be divided into two parts, one for instructions (L1i) and one for data (L1d). This is to support the various fetch bandwidth used by processors as most software tends to require more cache for data than instructions.
The latest devices have a cache capacity of 64KB—32KB of L1i and 32KB of L1d. In a quad-core processor, this adds up to 256KB of L1 cache memory.
Level 2 Cache
Level 2 cache is oftentimes also located inside the CPU chip, but just further away than the L1 to the core. Those are considerably less expensive than their L1 counterparts and are larger in size and capacity, and can be anywhere from 128KB to 8MB per core.
In some cases, L2 cache memories are implemented on a separate processing chip, also known as a co-processor.
Level 3 Cache
Level 3 cache memory, sometimes referred to as last-level cache (LLC), is located outside of the CPU but still in close proximity. It’s much larger than the L1 and L2 cache but is a bit slower.
Another difference is that L1 and L2 cache memories are exclusive to their processor core and cannot be shared. L3, on the other hand, is available to all cores. This allows it to play an important role in data sharing and inter-core communications and, in some cases depending on the design, with the cache of the graphics processing unit (GPU).
As for size, L3 cache on modern devices tend to range between 10MB and 64 MB per core or depending on the device’s specifications.
What is Cache Mapping?
Since cache memories are incredibly fast and continue to get larger alongside the requirements of software computing processes, there needs to be a system for retrieving the needed information. Otherwise, the CPU could end up wasting time searching for the right instruction on the memory instead of actually processing it.
The processor knows the RAM memory address of the data or instruction that it wants to read. It has to search the memory cache to see if there is a reference to that RAM memory address in the memory cache, along with the associated data or instruction.
There are numerous approaches for mapping the data and instructions pulled from the RAM to cache memory, and they tend to prioritize some aspects over others. For instance, minimizing the search speed makes it less accurate and reduces the likelihood of a cache hit. Meanwhile, maximizing the chances of a cache hit also increases the average search times.
Depending on the various levels of compromise in speed and accuracy, there are three types of cache mapping techniques.
Direct Cache Mapping
Direct cache memory mapping is the simplest and most straightforward information recovery technique. With this approach, each memory block is assigned a specific line in the cache, as determined by the RAM’s given address.
Based on this, the CPU would only have to search this single block to check whether the needed information is available or not. However, it’s not stored in that exact location, the CPU marks it as a cache miss and proceeds to search and pull the information directly from the RAM.
Direct mapping is highly inefficient, especially with devices with higher specs and larger flows of data and instructions into the cache memory and CPU.
Associative Cache Mapping
Associative cache mapping is the exact opposite of the direct approach. When pulling data and instructions from the RAM, any block can go into any line of the memory cache, randomizing its location. When the CPU searches for specific information, it has to check the entire cache memory to see if it contains what it’s looking for.
This approach yields a high rate of cache hits, and the CPU rarely resorts to retrieving information directly from the RAM. However, whatever time is saved by only communicating with the cache memory is wasted searching through all of its lines every time an instruction is needed.
Set-Associative Cache Mapping
Set-associative cache mapping is a way to compromise between direct and associative cache mapping, aiming to maximize events of a cache hit whilst minimizing the average search time per request. For this to work, each data block pulled from the RAM is only allowed to be mapped to a limited number of different cache memory blocks.
This is also known as N-way set-associative cache mapping. For each point of information or instruction, there is an N number of blocks where it can be mapped and later found by the CPU.
A 2-way set-associative mapping system gives the RAM the option to place data in one of two places in the cache memory. In this scenario, the cache hit likelihood increases, but the average search time doubles, as the CPU would need to check twice as many potential blocks.
A 4-way set-associative mapping system gives the RAM four potential mapping blocks, an 8-way mapping system provides eight potential mapping blocks, and a 16-way mapping system offers 16 variations. The higher the value of N, the higher the chances of a cache hit but the longer the average data block recovery time for the CPU will be.
The value of N is adjusted depending on the device and what it’s going to be used for, opting for a balanced ratio of time to cache hits.
3 Examples of How Cache Memory is Used
The use of cache memory isn’t only essential for on-premises and personal devices. It’s also used heavily in data center servers and in cloud computing offerings. Here are a few examples highlighting cache memory solutions.
Beat
Beat is a ride-hailing application based in Greece with over 700,000 drivers and 22 million active users globally. Founded in 2011, it’s now the fastest-growing app in Latin America, mainly in Argentina, Chile, Colombia, Peru, and Mexico.
During its hypergrowth period, Beat’s application started experiencing outages due to bottlenecks in the data delivery system. It had already been working with AWS and using the Amazon ElastiCache, but an upgrade in configuration was desperately needed.
“We could split the traffic among as many instances as we liked, basically scaling horizontally, which we couldn’t do in the previous solution,” said Antonis Zissimos, senior engineering manager at Beat. “Migrating to the newer cluster mode and using newer standard Redis libraries enabled us to meet our scaling needs and reduce the number of operations our engineers had to do.”
In less than two weeks, Beat was able to reset ElastiCache and reduce load per node by 25–30%, cut computing costs by 90%, and eliminate 90% of staff time spent on the caching layer.
Sanity
Sanity is a software service and a platform that helps developers better integrate content by treating it as structured data. With headquarters in the U.S. and Norway, its fast and flexible open-source editing environment, Sanity Studio, supports a fully customizable user interface (UI) and live-hosted data storage.
In order to support the real-time and fast delivery of content to the developers using its platform, Sanity needed to optimize its distribution. Working with Google Cloud, it turned to Cloud CDN to cache content in various servers located close to major end-user internet service providers (ISPs) globally.
“In the two years we’ve been using Google Kubernetes Engine, it has saved us plenty of headaches around scalability,” said Simen Svale Skogsrud, co-founder and CTO of Sanity. “It helps us to scale our operations to support our global users while handling issues that would otherwise wake us up at 3 a.m.”
Sanity is now able to process five times the traffic using Google Kubernetes Engine and support the accelerated distribution of data through an edge-cached infrastructure around the globe.
SitePro
SitePro is an automation software solution that offers real-time data capture at the source through an innovative end-to-end approach for the oil and gas industries. Based in the U.S., it’s an entirely internet-based solution that at one point controlled nearly half of the richest oil-producing area in the country.
To maintain control over the delicate operation, SitePro sought the help of Microsoft Azure, employing Azure Cache for Redis, and managed to boost its operations. Now, SitePro is in a position to start investing in green tech and more environmentally-friendly services and products.
“Azure Cache for Redis was the only thing that had the throughput we needed,” said Aaron Phillips, co-CEO at SitePro. “Between Azure Cosmos DB and Azure Cache for Redis, we’ve never had congestion. They scale like crazy.
“The scalability of Azure Cache for Redis played a major role in the speed at which we have been able to cross over into other industries.”
Thanks to Azure, SitePro was able to simplify its architecture and significantly improve data quality. It has now eliminated the gap in timing at no increase in costs.
Read more: 5 Top Memory Management Trends
Speed Up Device Processes With Cache Memory
Cache memory is a type of temporary storage hardware that allows the CPU to repeatedly retrieve information and instructions without having to resort to RAM or hard disk. For computers and servers, they’re built inside and as close as physically possible to the device’s core to reduce processing times.
Cache memories are categorized by level depending on their proximity to the device’s CPU into L1, L2, and L3 respectively. The average device contains multiples of each level of cache per core, and the greater their capacity, the faster the device’s computing.
Speed is also heavily reliant on the cache memory’s mapping technique and whether it prioritizes time or information search accuracy. There are three techniques: direct mapping, associative mapping, and set-associative mapping.