Domino clustering, how does it work?
Clustering Domino servers are quite easy to set up and configure. Domino clustering the best method to increase uptime and availability for your domino servers and will improve the performance for both servers and users who access your server and databases and balance the workload for your servers. This article will explain what Domino clustering really is, how it works, and mention important points to be aware of when deciding to go for, and working with Domino clusters.
Clustering a Domino server is like clustering any other kind of server. The main purpose of clustering servers is balancing the workload between servers and improve server performance. A Domino cluster is a group of servers, but not an infinite number of servers. A Domino cluster can include from two to six servers, and all servers in the cluster contain replicas of databases that you want to be readily available to users at all times. Doing this, the workload will be maintained. If a user tries to access a database on a cluster server, that is not available, Domino opens a replica of the database on a different server if a replica is available. Domino ensures that all these replicas are synchronized and replicated continuously, so information is always identical. Each server in a cluster contains cluster components that are installed with the Lotus Domino Enterprise license. These components, and the Administration Process, perform the maintenance of the clustering environment, managing and monitoring tasks that run the cluster, and let you administer the track the availability of servers and databases, and add servers and databases to the cluster. These components keep replica databases synchronized and the communication with each other to ensure that the cluster is running smoothly. To take advantage of failover and workload balancing, you have to distribute databases and replicas throughout the cluster. This will not be done automatically by Domino. The number of replicas you create for a database depends on how busy this database is and how important it is for users to have constant access to that database. Some databases might need several replicas while others might need few. As said, it all depends on your needs in your company.
Failover – how does it work?
Failover is the ability to redirect users to available resources, and in Domino that would be redirecting requests from one server to another server, within the cluster. A Domino server will notice if a database is heavily used or is not available, so when a user tries to connect to such a database, it will redirect the user to another replica of that database on another server in the cluster. On each server, there is a task called “Cluster Manager”, which frequently sends out probes to each other server in the cluster to determine the availability of each server. Based on the response of these probes, uses will be redirected to the most “correct” or available server in the cluster. This “Cluster Manager” also checks to see which databases and replicas that are available on the servers. So, a failover situation will never try to failover to a server that does not have the accessed database. Although the user connects to a database on a different server, failover is transparent to the user and will in most cases not even notice it at all.
When will failover occur?
Failover can occur in a lot of situations. Generally, one can say that failover occurs when users cannot access the server that contains the database, or they cannot access the database itself. Based on this, there are several reasons and situations when this could happen. The server might be down, there could be network connectivity problems, the server might have reached the maximum amount of users for the specific server, the administrator might have restricted the server for access, the server might be busy because of heavy load, or the database might also be marked as “Out of service” in the Cluster Database Directory.
When will failover does not occur?
Failover will not happen in all situations, even if the database cannot be accessed. If a user has already opened a database, and the server becomes unavailable, they will be no failover taking place. The user might close the database, and re-open it. This will cause failover to a different replica if one exists in the cluster. Another situation could be if a user tries to create a new database, based on a template on a certain template server, and this template server becomes unavailable, no failover will take place. There are several other situations as well, but these are among the most usual.
Workload balancing – how does it work?
Balancing the workload happens as you distribute your databases throughout the cluster. No servers do now need to be overloaded, because the load and access can be balanced between all the servers in the cluster. There are in addition several NOTES.INI variables that can be set to help you balance your workload. You might want to specify how busy a server can get, by specifying an availability threshold. When the server reaches the availability threshold, the Cluster Manager mars the server BUSY. If a server is in a BUSY state, requests to this server, like opening a database, are sent to other servers that contain replicas of the requested databases. There are also other variables to set in the NOTES.INI. This could be the maximum number of users you want to access a server. When the server reaches this limit, users are redirected to another server in the cluster. This keeps the workload balanced and keeps the server working at optimum performance. What really happens when a server is in a BUSY state, is that the Cluster Manager looks in the Cluster Database Directory for a replica of that database. It then checks the availability of the servers that contain a replica and redirects the users to the “best” server. If no other server contains a replica or if all servers are BUSY, the original database opens, even though the server is BUSY.
Which replica will be accessed during a failover?
When accessing a database that is, for some reason, not available at that moment, the “Cluster Manager” looks in the “Cluster Database Directory” for a replica of that database. To find the replica, the “Cluster Manager” looks for a database that has the same replica ID as the original database. If there are several databases on another server with the same replica ID, the “Cluster Manager” assumes that selective replication is used to replicate these databases as well. To be sure it fails to the correct replica, the “Cluster Manager” selects a replica that has the same path as the original database.
How does cluster replication work?
There are huge differences in the typical “schedule driver” replication between databases and servers not in a cluster, and what we might call “even-driven” replication between servers and databases in clusters. There is a special task on servers in the cluster, called “Cluster Replicator” that is responsible for replication being performed between the databases. When a cluster replicator learns of a change to a database, it immediately pushes that change to all other replicas in the cluster. All replication events are stored in memory, and if a destination server is not available, the “Cluster Replicator” continues to store these events in memory until the destination server becomes available. Another very important thing is that the “Cluster Replicator” leaves the processing of replication formulas to the standard replicator. Because these formulas can use a lot of processing power, they are not processed by the “Cluster Replicator” in order to minimize the overhead of using cluster replication. If you use selective replication, a database might temporarily include documents that do not match the selection formula. Domino deletes these documents when you run standard replication. Another thing the “Cluster Replicator” ignores, is the advanced panel in the replication settings dialog box. Therefore, you cannot disable the replication of specific elements of a database, such as ACL and design elements. The Cluster Replicator always attempts to make all replicas identical, so that users who fail over do not notice that they failed over.