CLUSTERS


Determining the number and placement of replicas in a cluster
There are two major reasons to create a replica for a database in a cluster -- to provide constant availability of the data and to distribute the workload between multiple servers. Before you create replicas in a cluster, consider how frequently users access a database and their need for data redundancy. If a database is extremely busy or its availability is extremely important, you may want to create multiple replicas and locate them on your most reliable servers. For databases that are not very busy and whose constant availability is not important, you may not want to create any replicas at all. A server log file, for example, does not need to have a replica on another server.

In general, the more replicas of a database, the more accessible the data. Creating too many replicas, however, can add unnecessarily to the overhead of maintaining a system and affect performance. As you plan your cluster strategy, try to create a balance between your users' requirements for data availability and the physical ability of each server in your cluster to manage additional workload. More than three replicas of a database may not provide you with significant incremental availability. If users can adequately access a database from one or two servers, do not increase the number of replicas in the cluster.

When users require the constant availability of a specific database, consider placing replicas on every server in the cluster if you have adequate disk space and resources. If you are a public service provider, this configuration provides the highest possible redundancy of data.

In addition, try to distribute the busiest databases to different servers so that no server contains too many busy databases. If the servers in the cluster all have a similar amount of processing power, you can have an equal load on each server, including the processing power reserved for failover. If a server has significantly more or less processing power than the other servers, consider changing the number of databases on the server and the number of databases that can fail over to the server. Also, distribute mail files across a cluster, or set up separate servers or separate clusters for mail.

Because busy databases in a cluster can create a lot of replication events, it is a good idea to install these replicas on the fastest disk hardware available in the cluster. If possible, place these replicas where other processes are not in contention -- for example, on a partition other than the one that contains the operating system swap file.

To view which databases and replicas already exist in the cluster, open the Cluster Database Directory (CLDBDIR.NSF). It contains a document that stores information about each database and replica in a cluster.

Note Selective replication formulas work differently in a cluster.

How many replicas to create

The following list describes some factors to consider when determining how many replicas to create.


Analyzing databases to determine the number of replicas

There are many factors to consider when deciding how many replicas to create. Some factors suggest creating more replicas, and some suggest creating fewer replicas. Below is a list of factors and how they might affect your cluster traffic and performance.

Prior to distributing databases in a cluster, it can be helpful to create a table of information about the databases and the cluster hardware. You can use the table to determine how important specific databases are and how adequate your resources are. You can include some or all of the following in the table:


Example table

When you create a table of database information, include the factors that are most important to you. The following table uses a subset of the preceding information to determine the number of replicas needed.
Database titleSize Maximum concurrent usersTransaction rateGrowth rateNeed for availabilitySuggested number of replicas
Product Discussion4GB600HighHighHigh2
Sales Tracking1GB200MediumHighCritical2 or more
Company Research2GB20LowMediumMedium0 or 1
Classified Ads 1GB50MediumMediumLow0
This table helps identify which databases require high availability, which databases are busiest, and how much additional disk space you will need in the future. In this example, two databases are very important and are growing rapidly. You should be sure that there are enough replicas of these databases so that they are always available. You should also be sure there is adequate disk space for growth on every server that contains a replica of these databases. One database is of medium importance, not growing as quickly, and not very active. You should provide no more than one replica of this database, unless it would affect your business negatively if the database was not available for a while. One database is not very important and does not require a replica in the cluster.

The number of concurrent users helps you determine the need for workload balancing. In this example, two databases are very busy and both are very important. Therefore, you should consider placing these databases on different servers to balance the workload. You should also be sure that workload balancing parameters are set on the servers that contain these databases so that users will fail over to another server when these databases become busy.