A cluster is a set of loosely coupled, independent computer systems that behave as a single system. Client applications interact with a cluster as if it is a single high-performance, highly reliable server. System managers view a cluster much as they see a single server. Cluster technology is readily adaptable to low-cost, industry-standard computer technology and interconnects.
Clustering can take many forms. A cluster may be nothing more than a set of standard personal computers interconnected by Ethernet. At the other end of the spectrum, the hardware structure may consist of high-performance SMP systems connected via a high-performance communications and I/O bus. In both cases, processing power can be increased in small incremental steps by adding another commodity system. To a client application, the cluster provides the illusion of a single server, or single-system image, even though it may be composed of many systems.
Additional systems can be added to the cluster as needed to process more complex or an increasing number of requests from the clients. If one system in a cluster fails, its workload can be automatically dispersed among the remaining systems. This transfer is frequently transparent to the client.
Two principal software models are used in clustering today: shared disk and shared nothing. In the shared disk model, software running on any system in the cluster may access any resource (e.g., a disk) connected to any system in the cluster. If two systems need to see the same data, the data must either be read twice from the disk or copied from one system to another. As in an SMP system, the application must synchronize and serialize its access to shared data. Typically a Distributed Lock Manager (DLM) is used to help with this synchronization. A DLM is a service provided to applications that tracks references to resources throughout the cluster. If more than one system attempts to reference a single resource, the DLM will recognize and resolve the potential conflict. DLM coordination, however, may cause additional message traffic and reduce performance because of the associated serialized access to additional systems. One approach to reducing these problems is the shared nothing software model.
In the shared nothing software model, each system within the cluster owns a subset of the cluster's resources. Only one system may own and access a particular resource at a time, although, on a failure, another dynamically determined system may take ownership of the resource. In addition, requests from clients are automatically routed to the system that owns the resource.
For example, if a client request requires access to resources owned by multiple systems, one system is chosen to host the request. The host system analyzes the client request and ships sub-requests to the appropriate systems. Each system executes the sub-request and returns only the required response to the host system. The host system assembles a final response and sends it to the client.
A single system request on the host system describes a high-level function (such as a multiple data record retrieve) that generates a great deal of system activity (such as multiple disk reads) and the associated traffic does not appear on the cluster interconnect until the final desired data is found. By utilizing an application that is distributed over multiple clustered systems, such as a database, overall system performance is not limited by a single computer's hardware limitations.
The shared disk and shared nothing models can be supported within the same cluster. Some software can most easily exploit the capabilities of the cluster through the shared disk model. This software includes applications and services that require only modest (and read-intensive) shared access to data, as well as applications or workloads that are very difficult to partition. Applications that require maximum scalability should use the cluster's shared nothing support.