High Availability is a paramount nowadays and there’s no better way to introduce high availability than to build it on top of the quorum-based cluster. Such cluster is able to easily handle failures of individual nodes and ensure that all nodes, which have disconnected from the cluster, will not continue to operate. There are several protocols that allow you to solve consensus issues, examples being Paxos or RAFT. You can always introduce your own code.
With this in mind, we would like to introduce you to CMON HA, a solution we created which allows to build highly available clusters of cmon daemons to achieve ClusterControl high availability. Please keep in mind this is a beta feature - it works but we are adding better debugging and more usability features. Having said that, let’s take a look at how it can be deployed, configured and accessed.
Prerequisites
CMON, the daemon that executes tasks in ClusterControl, works with a MySQL database to store some of the data - configuration settings, metrics, backup schedules and many others. In the typical setup this is a standalone MySQL instance. As we want to build highly available solution, we have to consider highly available database backend as well. One of the common solutions for that is MySQL Galera Cluster. As the installation scripts for ClusterControl sets up the standalone database, we have to deploy our Galera Cluster first, before we attempt to install highly available ClusterControl. What is the better way of deploying a Galera cluster than using ClusterControl? We will use temporary ClusterControl to deploy Galera on top of which we will deploy highly available version of ClusterControl.
Deploying a MySQL Galera Cluster
We won’t cover here the installation of the standalone ClusterControl. It’s as easy as downloading it for free and then following the steps you are provided with. Once it is ready, you can use the deployment wizard to deploy 3 nodes of Galera Cluster in couple of minutes.
Pick the deployment option, you will be then presented with a deployment wizard.
Define SSH connectivity details. You can use either root or password or passwordless sudo user. Make sure you correctly set SSH port and path to the SSH key.
Then you should pick a vendor, version and few of the configuration details including server port and root password. Finally, define the nodes you want to deploy your cluster on. Once this is done, ClusterControl will deploy a Galera cluster on the nodes you picked. From now on you can as well remove this ClusterControl instance, it won’t be needed anymore.
Deploying a Highly Available ClusterControl Installation
We are going to start with one node, configure it to start the cluster and then we will proceed with adding additional nodes.
Enabling Clustered Mode on the First Node
What we want to do is to deploy a normal ClusterControl instance therefore we are going to proceed with typical installation steps. We can download the installation script and then run it. The main difference, compared to the steps we took when we installed a temporary ClusterControl to deploy Galera Cluster, is that in this case there is already existing MySQL database. Thus the script will detect it, ask if we want to use it and if so, request password for the superuser. Other than that, installation is basically the same.
Next step would be to reconfigure cmon to listen not only on the localhost but also to bind to IP’s that can be accessed from outside. Communication between nodes in the cluster will happen on that IP on port (by default) 9501. We can accomplish this by editing file: /etc/default/cmon and adding IP to the RPC_BIND_ADDRESSES variable:
RPC_BIND_ADDRESSES="127.0.0.1,10.0.0.101"
Afterwards we have to restart cmon service:
service cmon restart
Following step will be to configure s9s CLI tools, which we will use to create and monitor cmon HA cluster. As per the documentation, those are the steps to take:
wget http://repo.severalnines.com/s9s-tools/install-s9s-tools.sh
chmod 755 install-s9s-tools.sh
./install-s9s-tools.sh
Once we have s9s tools installed, we can enable the clustered mode for cmon:
s9s controller --enable-cmon-ha
We can then verify the state of the cluster:
s9s controller --list --long
S VERSION OWNER GROUP NAME IP PORT COMMENT
l 1.7.4.3565 system admins 10.0.0.101 10.0.0.101 9501 Acting as leader.
Total: 1 controller(s)
As you can see, we have one node up and it is acting as a leader. Obviously, we need at least three nodes to be fault-tolerant therefore the next step will be to set up the remaining nodes.
Enabling Clustered Mode on Remaining Nodes
There are a couple of things we have to keep in mind while setting up additional nodes. First of all, ClusterControl creates tokens that “links” cmon daemon with clusters. That information is stored in several locations, including in the cmon database therefore we have to ensure every place contains the same token. Otherwise cmon nodes won’t be able to collect information about clusters and execute RPC calls. To do that we should copy existing configuration files from the first node to the other nodes. In this example we’ll use node with IP of 10.0.0.103 but you should do that for every node you plan to include in the cluster.
We’ll start by copying the cmon configuration files to new node:
scp -r /etc/cmon* 10.0.0.103:/etc/
We may need to edit /etc/cmon.cnf and set the proper hostname:
hostname=10.0.0.103
Then we’ll proceed with regular installation of the cmon, just like we did on the first node. There is one main difference though. Script will detect configuration files and ask if we want to install the controller:
=> An existing Controller installation detected!
=> A re-installation of the Controller will overwrite the /etc/cmon.cnf file
=> Install the Controller? (y/N):
We don’t want to do it for now. As on the first node we will be asked if we want to use existing MySQL database. We do want that. Then we’ll be asked to provide passwords:
=> Enter your MySQL root user's password:
=> Set a password for ClusterControl's MySQL user (cmon) [cmon]
=> Supported special characters: ~!@#$%^&*()_+{}<>?
=> Enter a CMON user password:
=> Enter the CMON user password again: => Creating the MySQL cmon user ...
Please make sure you use exactly the same password for cmon user as you did on the first node.
As the next step, we want to install s9s tools on new nodes:
wget http://repo.severalnines.com/s9s-tools/install-s9s-tools.sh
chmod 755 install-s9s-tools.sh
./install-s9s-tools.sh
We want to have them configured exactly as on the first node thus we’ll copy the config:
scp -r ~/.s9s/ 10.0.0.103:/root/
scp /etc/s9s.conf 10.0.0.103:/etc/
There’s one more place where we ClusterControl stores token: /var/www/clustercontrol/bootstrap.php. We want to copy that file as well:
scp /var/www/clustercontrol/bootstrap.php 10.0.0.103:/var/www/clustercontrol/
Finally, we want to install the controller (as we skipped this when we ran the installation script):
apt install clustercontrol-controller
Make sure you do not overwrite existing configuration files. Default options should be safe and leave correct configuration files in place.
There is one more piece of configuration you may want to copy: /etc/default/cmon. You want to copy it to other nodes:
scp /etc/default/cmon 10.0.0.103:/etc/default
Then you want to edit RPC_BIND_ADDRESSES to point to a correct IP of the node.
RPC_BIND_ADDRESSES="127.0.0.1,10.0.0.103"
Then we can start the cmon service on the nodes, one by one, and see if they managed to join the cluster. If everything went well, you should see something like this:
s9s controller --list --long
S VERSION OWNER GROUP NAME IP PORT COMMENT
l 1.7.4.3565 system admins 10.0.0.101 10.0.0.101 9501 Acting as leader.
f 1.7.4.3565 system admins 10.0.0.102 10.0.0.102 9501 Accepting heartbeats.
f 1.7.4.3565 system admins 10.0.0.103 10.0.0.103 9501 Accepting heartbeats.
Total: 3 controller(s)
In case of any issues, please check if all the cmon services are bound to the correct IP addresses. If not, kill them and start again, to re-read the proper configuration:
root 8016 0.4 2.2 658616 17124 ? Ssl 09:16 0:00 /usr/sbin/cmon --rpc-port=9500 --bind-addr='127.0.0.1,10.0.0.103' --events-client='http://127.0.0.1:9510' --cloud-service='http://127.0.0.1:9518'
If you manage to see the output from ‘s9s controller --list --long’ as above, this means that, technically, we have a running cmon HA cluster of three nodes. We can end here but it’s not over yet. The main problem that remains is the UI access. Only leader node can execute jobs. Some of the s9s commands support this but as of now UI does not. This means that the UI will work only on the leader node, in our current situation it is the UI accessible via https://10.0.0.101/clustercontrol.
In the second part we will show you one of the ways in which you could solve this problem.