Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Refers to mechanisms and policies that restrict access to computer resources. An access control list (ACL), for example, specifies what operations different users can perform on specific files and directories.
(1) A collection of wires through which data is transmitted from one part of a computer to another. You can think of a bus as a highway on which data travels within a computer. When used in reference to personal computers, the term bus usually refers to internal bus. This is a bus that connects all the internal computer components to the CPU and main memory. There's also an expansion bus that enables expansion boards to access the CPU and memory.
All buses consist of two parts -- an address bus and a data bus. The data bus transfers actual data whereas the address bus transfers information about where the data should go.
The size of a bus, known as its width, is important because it determines how much data can be transmitted at one time. For example, a 16-bit bus can transmit 16 bits of data, whereas a 32-bit bus can transmit 32 bits of data.
Every bus has a clock speed measured in MHz. A fast bus allows data to be transferred faster, which makes applications run faster. On PCs, the old ISA bus is being replaced by faster buses such as PCI.
Nearly all PCs made today include a local bus for data that requires especially fast transfer speeds, such as video data. The local bus is a high-speed pathway that connects directly to the processor.
Several different types of buses are used on Apple Macintosh computers. Older Macs use a bus called NuBus, but newer ones use PCI.
(2) In networking, a bus is a central cable that connects all devices on a local-area network (LAN). It is also called the backbone.
Pronounced cash, a special high-speed storage mechanism. It can be either a reserved section of main memory or an independent high-speed storage device. Two types of caching are commonly used in personal computers: memory caching and disk caching.
A memory cache, sometimes called a cache store or RAM cache, is a portion of memory made of high-speed static RAM (SRAM) instead of the slower and cheaper dynamic RAM (DRAM) used for main memory. Memory caching is effective because most programs access the same data or instructions over and over. By keeping as much of this information as possible in SRAM, the computer avoids accessing the slower DRAM.
Some memory caches are built into the architecture of microprocessors. The Intel 80486 microprocessor, for example, contains an 8K memory cache, and the Pentium has a 16K cache. Such internal caches are often called Level 1 (L1) caches. Most modern PCs also come with external cache memory, called Level 2 (L2) caches. These caches sit between the CPU and the DRAM. Like L1 caches, L2 caches are composed of SRAM but they are much larger.
Disk caching works under the same principle as memory caching, but instead of using high-speed SRAM, a disk cache uses conventional main memory. The most recently accessed data from the disk (as well as adjacent sectors) is stored in a memory buffer. When a program needs to access data from the disk, it first checks the disk cache to see if the data is there. Disk caching can dramatically improve the performance of applications, because accessing a byte of data in RAM can be thousands of times faster than accessing a byte on a hard disk.
When data is found in the cache, it is called a cache hit, and the effectiveness of a cache is judged by its hit rate. Many cache systems use a technique known as smart caching, in which the system can recognize certain types of frequently used data. The strategies for determining which information should be kept in the cache constitute some of the more interesting problems in computer science.
In communications, the term channel refers to a communications path between two computers or devices. It can refer to the physical medium (the wires) or to a set of properties that distinguishes one channel from another.
A simple error-detection scheme in which each transmitted message is accompanied by a numerical value based on the number of set bits in the message. The receiving station then applies the same formula to the message and checks to make sure the accompanying numerical value is the same. If not, the receiver can assume that the message has been garbled.
Refers to the validity of data. Data integrity can be compromised in a number of ways:
There are many ways to minimize these threats to data integrity. These include:
(Also spelled datamart) A database, or collection of databases, designed to help managers make strategic decisions about their business. Whereas a data warehouse combines databases across an entire enterprise, data marts are usually smaller and focus on a particular subject or department. Some data marts, called dependent data marts, are subsets of larger data warehouses.
A hot buzzword for a class of database applications that look for hidden patterns in a group of data. For example, data mining software can help retail companies find customers with common interests. The term is commonly misused to describe software that presents data in new ways. True data mining software doesn't just change the presentation, but actually discovers previously unknown relationships among the data.
A collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time.
Development of a data warehouse includes development of systems to extract data from operating systems plus installation of a warehouse database system that provides managers flexible access to the data.
The term data warehousing generally refers to combine many different databases across an entire enterprise. Contrast with data mart.
A technique in which data is written to two duplicate disks simultaneously. This way if one of the disk drives fails, the system can instantly switch to the other disk without any loss of data or service. Disk mirroring is used commonly in on-line database systems where it's critical that the data be accessible at all times.
A technique for spreading data over multiple disk drives. Disk striping can speed up operations that retrieve data from disk storage. The computer system breaks a body of data into units and spreads these units across the available disks. Systems that implement disk striping generally allow the user to select the data unit size or stripe width.
Disk striping is available in two types. Single user striping uses relatively large data units, and improves performance on a single-user workstation by allowing parallel transfers from different disks. Multi-user striping uses smaller data units and improves performance in a multi-user environment by allowing simultaneous (or overlapping) read operations on multiple disk drives.
Disk striping stores each data unit in only one place and does not offer protection from disk failure.
The ability of a system to respond gracefully to an unexpected hardware or software failure. There are many levels of fault tolerance, the lowest being the ability to continue operation in the event of a power failure. Many fault-tolerant computer systems mirror all operations -- that is, every operation is performed on two or more duplicate systems, so if one fails the other can take over.
A serial data transfer architecture developed by a consortium of computer and mass storage device manufacturers and now being standardized by ANSI. The most prominent Fibre Channel standard is Fibre Channel Arbitrated Loop (FC-AL).
FC-AL was designed for new mass storage devices and other peripheral devices that require very high bandwidth. Using optical fiber to connect devices, FC-AL supports full-duplex data transfer rates of 100MBps. FC-AL is compatible with, and is expected to eventually replace, SCSI for high-performance storage systems.
The ability to add and remove devices to a computer while the computer is running and have the operating system automatically recognize the change. Two new external bus standards -- Universal Serial Bus (USB ) and IEEE 1394 -- support hot plugging. This is also a feature of PCMCIA.
Hot plugging is also called hot swapping.
A common connection point for devices in a network. Hubs are commonly used to connect segments of a LAN. A hub contains multiple ports. When a packet arrives at one port, it is copied to the other ports so that all segments of the LAN can see all packets.
A passive hub serves simply as a conduit for the data, enabling it to go from one device (or segment) to another. So-called intelligent hubs include additional features that enables an administrator to monitor the traffic passing through the hub and to configure each port in the hub. Intelligent hubs are also called manageable hubs.
A third type of hub, called a switching hub, actually reads the destination address of each packet and then forwards the packet to the correct port.
JBOD (Just a Bunch Of Disks)
Just a Bunch Of Disks Used to refer to hard disks that aren't configured according to RAID -- a subsystem of disk drives that improves performance and fault tolerance.
network attached storage (NAS)
A network-attached storage (NAS) device is a server that is dedicated to nothing more than file sharing. NAS does not provide any of the activities that a server in a server-centric system typically provides, such as e-mail, authentication or file management. NAS allows more hard disk storage space to be added to a network that already utilizes servers without shutting them down for maintenance and upgrades. With a NAS device, storage is not an integral part of the server. Instead, in this storage-centric design, the server still handles all of the processing of data but a NAS device delivers the data to the user. A NAS device does not need to be located within the server but can exist anywhere in a LAN and can be made up of multiple networked NAS devices.
A technology that uses glass (or plastic) threads (fibers) to transmit data. A fiber optic cable consists of a bundle of glass threads, each of which is capable of transmitting messages modulated onto light waves.
Fiber optics has several advantages over traditional metal communications lines:
- Fiber optic cables are less susceptible than metal cables to interference.
- Data can be transmitted digitally (the natural form for computer data) rather than analogically.
The main disadvantage of fiber optics is that the cables are expensive to install. In addition, they are more fragile than wire and are difficult to split.
Fiber optics is a particularly popular technology for local-area networks. In addition, telephone companies are steadily replacing traditional telephone lines with fiber optic cables. In the future, almost all communications will employ fiber optics.
2 to the 50th power (1,125,899,906,842,624) bytes. A petabyte is equal to 1,024 terabytes.
Short for Redundant Array of Independent (or Inexpensive) Disks, a category of disk drives that employ two or more drives in combination for fault tolerance and performance. RAID disk drives are used frequently on servers but aren't generally necessary for personal computers.
There are number of different RAID levels. The three most common are 0, 3, and 5:
A Storage Area Network (SAN) is a high-speed subnetwork of shared storage devices. A storage device is a machine that contains nothing but a disk or disks for storing data.
A SAN's architecture works in a way that makes all storage devices available to all servers on a LAN or WAN. As more storage devices are added to a SAN, they too will be accessible from any server in the larger network. In this case, the server merely acts as a pathway between the end user and the stored data.
Because stored data does not reside directly on any of a network's servers, server power is utilized for business applications, and network capacity is released to the end user.
A popular buzzword that refers to how well a hardware or software system can adapt to increased demands. For example, a scalable network system would be one that can start with just a few nodes but can easily expand to thousands of nodes. Scalability can be a very important feature because it means that you can invest in a system with confidence you won't outgrow it.
Acronym for small computer system interface. Pronounced "scuzzy," SCSI is a parallel interface standard used by many servers for attaching peripheral devices to computers.
SCSI interfaces provide for faster data transmission rates (up to 80 megabytes per second) than standard serial and parallel ports. In addition, you can attach many devices to a single SCSI port, so that SCSI is really an I/O bus rather than simply an interface.
Although SCSI is an ANSI standard, there are many variations of it, so two SCSI interfaces may be incompatible. For example, SCSI supports several types of connectors.
The following varieties of SCSI are currently implemented:
Wide Ultra2 SCSI: Uses a 16-bit bus and supports data rates of 80 MBps.
2 to the 40th power (1,099,511,627,776) bytes. This is approximately 1 trillion bytes.
The amount of data transferred from one place to another or processed in a specified amount of time. Data transfer rates for disk drives and networks are measured in terms of throughput. Typically, throughputs are measured in Kbps, Mbps and Gbps.
A type of computer processing in which the computer responds immediately to user requests. Each request is considered to be a transaction. Automatic teller machines for banks are an example of transaction processing.
The opposite of transaction processing is batch processing, in which a batch of requests is stored and then executed all at one time. Transaction processing requires interaction with a user, whereas batch processing can take place without a user being present.
A fixed amount of storage on a disk or tape. The term volume is often used as a synonym for the storage medium itself, but it is possible for a single disk to contain more than one volume or for a volume to span more than one disk.