Monitor MySQL Galera Cluster from Split-Brain

I have another set of MySQL Galera Cluster running on Percona XtraDB Cluster which having 2 nodes with 1 arbitrator. In total, I do have 3 votes in the quorum. The expected problem when we have 2 nodes run in cluster is the possibility for this cluster to be split-brain if the arbitrator is down, followed by network switch down in the same time. This will surely bring a great impact to your database consistency and cluster availability.

To avoid this thing to happen, we need to closely monitor the number of nodes being seen by the cluster. If 3 nodes, it is normal and do nothing. If 2 nodes, we should send a warning which notify one server is down and if 1 node, shutdown the MySQL server so it will prevent for the split-brain to happen.

To check what is the number of node being seen by the cluster in MySQL Galera, we can use this command:

$ mysql -e "show status like 'wsrep_cluster_size'";
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+

We should create a BASH script which monitor and evaluate this value to the respective action.

1. First of all, install sendmail and mailx. We need this in order to send the alert via email:

$ yum install sendmail mailx -y

2. Make sure sendmail will be started on boot and we need to start the service as well:

$ chkconfig sendmail on
$ service sendmail start

3. Create the BASH script for this monitoring using text editor in /root/scripts directory named galera_monitor:

$ mkdir -p /root/scripts
$ vim /root/scripts/galera_monitor

And paste following line:

#!/bin/bash
## Monitor galera cluster size
 
## Where the alert should be sent to
EMAIL="[email protected]domain.com"
 
cluster_size=`mysql -e "show status like 'wsrep_cluster_size'" | tail -1 | awk {'print $2'}`
hostname=`hostname`
error=`tail -100 /var/lib/mysql/mysql-error.log`
 
SUBJECT1="ERROR: [$hostname] Galera Cluster Size"
SUBJECT2="WARNING: [$hostname] Galera Cluster Size"
EMAILMESSAGE="/tmp/emailmessage.txt"
 
echo "Cluster size result: $cluster_size" > $EMAILMESSAGE
echo "Latest error: $error" >> $EMAILMESSAGE
 
if [ $cluster_size -eq 1 ]; then
    /bin/mail -s "$SUBJECT1" "$EMAIL" < $EMAILMESSAGE
    /etc/init.d/mysql stop                    # stop the mysql server to prevent split-brain
elif [ $cluster_size -eq 2 ]; then
    /bin/mail -s "$SUBJECT2" "$EMAIL" < $EMAILMESSAGE
fi

4. Add the root login credentials into /root/.my.cnf so it can auto-login into mysql console:

[client]
user=root
password=MyR00tP4ss

5. Change the permission of /root/.my.cnf so it only accessible by root:

$ chmod 400 /root/.my.cnf

6. Change the permission of the script so it is executable:

 $ chmod 755 /root/scripts/galera_monitor

7. Add the scripts into cron:

$ echo "* * * * * /bin/sh /root/scripts/galera_monitor" >> /var/spool/cron/root

8. Reload cron daemon to apply changes:

$ service crond reload

Done. You should received an email every minutes if your cluster size has reduced to 2 and you should do something about it to bring the 3 nodes up. If cluster size is 1, then it will stop the mysql server from running.

Notes: You should NOT enable the cron if you re-initialize your Galera cluster, as it will keep MySQL stopping. This script is only suitable for monitoring production cluster.