Shared memory is a technology that enables computer programs to simultaneously share memory resources for higher performance and fewer redundant data copies. Shared system memory can run on single processor systems, parallel multiprocessors, or clustered microprocessors. The technology is somewhat different for distributed systems, but shared memory can operate there as well.
Shared memory is not highly scalable, and data coherency can be an issue. But in the right environment and with cache coherency protocols running, shared memory offers more advantages than issues.
Shared memory is a class of Inter-Process Communication (IPC) technology, which improves communications between computer components.
More data storage articles:https://o1.qnsr.com/log/p.gif?;n=203;c=204660761;s=10655;x=7936;f=201812281257540;u=j;z=TIMESTAMP;a=20400368;e=i
What is the Shared Memory Process?
In its simplest form, shared memory is a low-level programming process on a single server that enables clients and servers to exchange data and instructions using main memory. Performance is much faster than using system service like operating system data buffers.
For example, a client needs to exchange data with the server for modification and return. Without shared memory, both client and server use operating system buffers to accomplish the modification and exchange.
The client writes to an output file in the buffer, and the server writes the file to its workspace. When it completes the modification, the process reverses. Each single time this occurs, the system generates 2 reads and 2 writes between client and server.
With shared memory, the client writes its process directly into RAM and issues a semaphore value to flag server attention. The server accomplishes the modifications directly in main memory and alerts the client by changing the semaphore value. There is only 1 read and 1 write per communication, and the read/write is considerably faster than using system services.
Shared Memory and Single Microprocessor General Flow
- Server uses a system call to request a shared memory key, and memorizes the returned ID.
- Server starts.
- Server issues another system call to attach shared memory to the server's address space.
- Server initializes the shared memory.
- Client starts.
- Client requests shared memory
- Server issues unique memory ID to client
- Client attaches shared memory ID to the address space and uses the memory.
- When complete, client detaches all shared memory segments and exits.
- Using two more system calls, server detaches and removes shared memory .
Multi-Processor Shared Memory
This simplified scheme works for single microprocessors, but memory sharing among multiple microprocessors is more complex especially when each microprocessor has its own memory cache. Popular approaches include uniform memory access (UMA), and non-uniform memory access (NUMA). Distributed memory sharing is also possible, although it uses different sharing technology.
UMA: Shared memory in parallel computing environments
In parallel computing, multiprocessors use the same physical memory and access it in parallel, although the processors may have a private memory caches as well. Shared memory accelerates parallel execution of large applications where processing time is critical.
NUMA: Shared memory in symmetric multiprocessor systems (SMU)
NUMA configures SMU to use shared memory. SMU is a clustered architecture that tightly couples multiple processors in a share-everything single server environment with a single OS. As each processor uses the same bus, intensive operations will slow down performance and increase latency.
NUMA replaces the single system bus by grouping CPU and memory resources into configurations it calls NUMA nodes. Multiple high-performing nodes efficiently operate within clusters, allowing CPUs to treat its assigned nodes as a local shared memory resource. This relieves the load on the bus, assigning it to flexible, high performance memory nodes.
Shared memory in distributed systems
Distributed shared memory uses a different technology but has the same result: separate computers share memory for better performance and scalability. Distributed shared memory enables separate computer systems to access each other’s memory by abstracting it from the server level into a logically shared address space.
The architecture can either separate memory and distribute the parts among the nodes and main memory, or can distribute all memory between the nodes. Distributed memory sharing uses either hardware (network interfaces and cache coherence circuits) or software. Unlike single or multiprocessor shared memory, distributed memory sharing scales efficiently and supports intensive processing tasks such as large complex databases.
Caution: Shared Memory Challenges
Shared memory programming is straightforward in a single CPU or clustered CPUs. All processors share the same view of data, and communication between them is very fast; and shared memory programming is a relatively simple affair.
However, most multiprocessor systems assign individual cache memory to its processors in addition to main memory. Cache memory processing is considerably faster than using RAM, but can cause conflict and data degradation if the same system is also using shared memory. There are three main issues for shared memory in cache memory architectures: degraded access times, data incoherence and false sharing.
Degraded access time
Several processors cause contention and performance slowdowns by accessing the same memory location at the same time. For this reason, non-distributed shared memory systems do not scale very efficiently over ten processors.
Multiple processors with memory sharing typically have individual memory caches to speed up performance. In this system, two or more processors may have cached copy of the same memory location. Both processors modify the data without being aware of another cache’s modifications, meaning that the data that should be identical—i.e. coherent--is now incoherent, and can lead to corruption when that data is written back to the main memory.
Cache coherence protocols manage these conflicts by synchronizing data values within multiple caches. Whenever a cache propagates modified back to the shared memory location, the data remains coherent. Cache coherence protects high-performance cache memory while supporting memory sharing.
This memory usage pattern degrades performance, and occurs in multiprocessor systems with shared memory and individual processor caches. Caching works by reading data from the assigned memory location plus nearby locations. (The minimum size of a cache line is 64 bytes.) The problem arises when the processor accesses a shared block that contains modifiable data, or variables. Whether or not one processor actually modifies that data does not matter; reading changes, the other caches will reload their entire blocks. The cache coherency protocol does not initiate the reload and does not grant it any resources, so the incoming process must bear the overhead. This forces the main bus to reconnect with every write to shared memory locations, degrading performance and wasting bandwidth.
Programming is the solution
“Cache padding” inserts meaningless bytes between the exact memory location and its neighbors, so the single 64-byte cache line only writes the exact data. Cache coherency does the synchronization, so other caches are not forced to reload their blocks.
Shared Memory Advantages
Multiple applications share memory for more efficient processing.
- Efficiently passes data between programs to improve communications and efficiency.
- Works in single microprocessor systems, multiprocessor parallel or symmetric systems, and distributed servers.
- Avoids redundant data copies by managing shared data in main memory in caches.
- Minimizes input/output (I/O) processes by enabling program to access a single data copy already in memory.
- For programmers, the main advantage of the shared memory is that there is no need to write explicit code for processor interaction and communication.
Cache coherence protocols protect shared memory against data incoherence and performance slow-downs.