cPanel: Exclude Directory during Backup

Backup is the first thing-to-do and should not be forgotten by a good system administrator. Since cPanel has their built-in backup creator as well as backup management system, we can take advantage of this tool and use them to suit our needs. In my situation, we have many cPanel accounts and some of them is higher than 10GB of disk usage, mostly due to website uploaded contents.

Creating backup will be a hard thing if you have too many inodes or too much disk consumption. It is a good thing if we can exclude some of the directory for example user_uploaded when creating cPanel backup, and the rest can be backup manually by downloading them to a local server.

In this tutorial I will create a full backup what some directories being excluded. Variables as below:

OS: RHEL 4 32bit
cPanel account: premen
Home directory for user: /home/premen

1. Identify the directory that we want to exclude. In this case, I will exclude the high disk storage directory. Using following command might help:

$ cd /home/premen
$ du -h | grep G

The command should list all directories which more than 1GB of size. Example as below:

6.4G    ./public_html/portal/tmp
8.7G    ./public_html/portal/user_uploaded
15.1G   ./public_html

2. Then we need to generate a file called cpbackup-exclude.conf under respective home directory of the user as refer to cPanel documentation at here:

$ cd /home/premen
$ touch cpbackup-exclude.conf
$ vi  cpbackup-exclude.conf

Paste following line:

public_html/portal/tmp
public_html/portal/user_uploaded

3. Now we will create the backup either via pkgacct script:

/usr/local/cpanel/scripts/pkgacct premen /home userbackup

or you can click “Download or Generate a Full Website Backup” under cPanel as screenshot below:

Once the backup ready, you will notice that both directories have been excluded from the cPanel full backup and your backup size should be smaller and faster to be compressed. Cheers!

cPanel: PHPList Subscriber Mailing List Maintenance

In my company, I am also responsible to handle and manage the mailing list server. We are using PHPList, a popular mailing list program to blast out mails and we use it to send our latest news, promotions, announcements and notification to our subscribers. We have a lot of subscribers, which include some of them are problematic mail recipients with such following error:

  • mailbox unavailable
  • no such user
  • user rejected
  • domain already expired

In order to do maintenance on the subscriber list, I need to make sure only active subscriber (which have active mailbox) exist in our mailing list. By referring to exim_mainlog and looking for recipient bounce error, we can remove unwanted recipient from our subscriber list. Variable as below:

OS: CentOS 4 64bit
PHPlist version: 2.10.5
Mail server log: /var/log/exim_mainlog
PHPList database: plist_db
PHPList Username: plist_userdb
PHPList password: p412#Yf

1. Lets generate error log from exim_mainlog so we can easily extract the error information. We will use eximstats command and generate the output to html files, same as what you will see under WHM > Email > View Mail Statistics:

$ /usr/sbin/eximstats -html=/root/mailstats.html -nr -nt -nvr /var/log/exim_mainlog

2. If we go through the generated html file, you can see the list of error captured by exim_mainlog. Example error as below:

<li>1 - [email protected] R=fail_remote_domains: The mail server could not deliver mail to [email protected] The account or domain may not exist, they may be blacklisted, or missing the proper dns entries.

The error above describing that the remote domain is no longer exist and unreachable. We do not want this email account anymore in our subscriber list so we need to extract the email address and remove from PHPlist database.

3. Lets extract all emails which related to this error to a file called domain_error.txt:

$ cat /root/mailstats.html | grep "The account or domain may not exist, they may be blacklisted, or missing the proper dns entries" > domain_error.txt

4. From the domain_error.txt list, we can extract email address only so we can pass this value to PHPlist database via SQL statement:

perl -wne'while(/[\w\.\-][email protected][\w\.\-]+\w+/g){print "$&\n"}' domain_error.txt | sort -u > delete_list.txt

5. We should now have a list of email address that we want. We will use a BASH script to automate the deletion process. Create a file called phplist_delete under /root/scripts folder:

mkdir -p /root/scripts
touch /root/scripts/phplist_delete

Copy and paste following scripts and change the configuration value to the one that suits you. You can retrieve the database information inside PHPlist config.php file (usually under config directory):

#!/bin/bash
# Delete PHPlist subscriber who listed in delete_list.txt
 
# Configuration value
HOST="localhost"
DATABASE="plist_db"
USERNAME="plist_userdb"
PASSWORD='p412#Yf'
DELETE_LIST="/root/delete_list.txt"
 
###
MYSQL="$(which mysql)"
 
if [ ! -f $DELETE_LIST ]
then
        echo "List file not found!"
        exit 1
else
        echo "Deleting email addresses in PHPlist database.."
        cat $DELETE_LIST | while read emailadd; do
                $MYSQL -h $HOST -u $USERNAME -p$PASSWORD <<< "DELETE FROM $DATABASE.phplist_user_user WHERE email like '$emailadd';"
                echo "$emailadd: deleted!"
        done
        $MYSQL -h $HOST -u $USERNAME -p$PASSWORD <<< "DELETE from $DATABASE.phplist_user_user_attribute WHERE userid NOT IN (SELECT  id FROM $DATABASE.phplist_user_user);"
        $MYSQL -h $HOST -u $USERNAME -p$PASSWORD <<< "DELETE from $DATABASE.phplist_user_user_history WHERE userid NOT IN (SELECT id FROM $DATABASE.phplist_user_user);"
        echo "Process completed!"
fi

6. Change the permission:

chmod 755 /root/scripts/phplist_delete

7. Lets run the script and you are done!

/root/scripts/phplist_delete

Sort and Count IP in Apache Access Logs

My boss has asked me to generate a simple report on list of IP addresses which access to our website yesterday with a “ref=1” keyword. What I need most is the Apache access logs which located under /etc/httpd/domlogs/mydomain.net (default Apache logs for cPanel servers).

From the log files, I will need to extract yesterday’s log which is 22 November 2011 and output them into another file called 20111122.txt. Command as below:

$ cat /etc/httpd/logs/domlogs/mydomain.net | grep "22/11/2011" > 20111122.txt

Next, I extract the logs to match the keyword (ref=1) and output to another file:

$ cat /etc/httpd/logs/domlogs/20111122.txt | grep '\bref=1\b' > 20111122_keyword.txt

Once the keyword extract, I will do the counting and generate a report in http root folder so I can view it via web browser:

$ cat /etc/httpd/logs/domlogs/20111122_keyword.txt | awk '{print $1}' | cut -d: -f1 | sort | uniq -c | sort -n > /home/mydomain/public_html/ip_report.txt

After that, I view the generate report via web browser at http://mydomain.net/ip_report.txt and following output should appear:

----------------
    818 118.100.150.10
    821 175.142.245.213
    824 124.43.99.74
    829 137.186.89.211
    835 92.101.85.191
    855 41.78.17.172
    855 85.65.48.168
    855 87.69.176.55
    858 201.241.218.180
    880 14.99.3.152
    889 190.197.28.142
    889 41.72.10.60
    902 201.248.128.162
    905 123.176.15.119
    911 115.132.145.225
    911 115.66.95.153
    918 151.27.94.55
    939 189.115.158.192
    947 65.49.71.172
    949 91.185.109.175
    952 186.146.220.235
    980 85.230.95.192
   1019 41.236.216.132
   1030 38.111.147.83
   1040 99.66.113.53
   1062 41.70.178.190
   1071 175.144.133.164
   1107 41.78.17.186
----------------

The left column is number of IP counted while the right column is the IP address respectively. This simple report should be enough to suit what my boss wants!

CentOS 6: Bonding Multiple Network Interface Cards (NICs)

Bonding NICs can help us achieve high network availability even if one of the network card falls down. Usually, bonding NICs are implement for high-speed no interruption service like SAN storage. Do not think that bonding will multiply the data transmit. To multiply data transmit, we need to do multipathing. You can refer here for more information: http://blog.open-e.com/bonding-versus-mpio-explained/.

First off all, make sure your server has 2 or more NICs to be combined. Make sure all of them are connected to a same switch as picture below:

Variables that I used:

OS: CentOS 6 64bit
Storage server main IP: 192.168.0.100
Network interface cards involved: eth0, eth1, eth2, eth3

1. Make sure we have the utilities to bond NICs. Run following command to install:

yum install ethtool -y

2. Create a new bond NIC called bond0. This will be our bonding master interface, where our real NIC will be use as slave interface. Whenever a slave NIC down, another slave NIC will take over so the master interface will be not affected:

touch /etc/sysconfig/network-scripts/ifcfg-bond0
vi  /etc/sysconfig/network-scripts/ifcfg-bond0

And add following line:

DEVICE=bond0
ONBOOT=yes
IPADDR=192.168.0.100
NETMASK=255.255.255.0
NETWORK=192.168.0.0
USERCTL=no
BOOTPROTO=no

3. Now we need to change some values on every physical NICs configuration files that we have. In this case, I will need to change following files under /etc/sysconfig/network-scripts directory:

ifcfg-eth0:

DEVICE=eth0
ONBOOT=yes
USERCTL=no
MASTER=bond0
SLAVE=yes
BOOTPROTO=no

ifcfg-eth1:

DEVICE=eth1
ONBOOT=yes
USERCTL=no
MASTER=bond0
SLAVE=yes
BOOTPROTO=no

ifcfg-eth2:

DEVICE=eth2
ONBOOT=yes
USERCTL=no
MASTER=bond0
SLAVE=yes
BOOTPROTO=no

ifcfg-eth3:

DEVICE=eth3
ONBOOT=yes
USERCTL=no
MASTER=bond0
SLAVE=yes
BOOTPROTO=no

4. Now we need to register the bonding module inside CentOS as a device. We need to create a new files called bonding.conf under /etc/modprobe.d directory:

touch /etc/modprobe.d/bonding.conf
vi /etc/modprobe.d/bonding.conf

Add following line:

alias bond0 bonding
options bond0 mode=5 miimon=100

List of bonding modes can be refer here:

http://unixfoo.blogspot.com/2008/02/network-bonding-part-ii-modes-of.html

5. Add the module to kernel:

modprobe bonding

6. Restart network:

service network restart

Done! You will noticed that another interface is added into interface list (ifconfig) called bond0. You can monitor the bonding state by running following command:

watch -n1 'cat /proc/net/bonding/bond0'

Create MySQL Database Backup Every Half an Hour

Our company has launched an online contest for our dedicated clients and we are collecting some really important information from them in order join the contest. My boss wants me to create a database backup every half an hour to make sure we reduce the data loss chance to the minimum possible.

I have create a simple BASH script to accomplish this. The script will create a MySQL database backup and specified in a folder based on date. We only stored 3 latest days backup and will remove other backup folder if exist.

1. Create the script in desired directory. In this case, I will use /home/scripts :

$ mkdir /home/scripts
$ touch /home/scripts/mysqlbackup_30min

2. Using your favourite text editor, paste following scripts into the script /home/scripts/mysqlbackup_30min :

#!/bin/bash
# Scripts to create mysql backup every half and hour
 
# Confiration value
mysql_host=""
mysql_database=""
mysql_username=""
mysql_password=''
backup_path=/backup/mysql
expired=3			#how many days before the backup directory will be removed
 
today=`date +%Y-%m-%d`
 
if [ ! -d $backup_path/$today ]
then
        mkdir -p $backup_path/$today
else
        /usr/bin/mysqldump -h $mysql_host -u $mysql_username -p$mysql_password $mysql_database > $backup_path/$today/$mysql_database-`date +%H%M`.sql
fi
 
# Remove folder which more than 3 days
find $backup_path -type d -mtime +$expired | xargs rm -Rf

3. Change configuration value to suit your environment. In this case I will information as below:

mysql_host="localhost"
mysql_database="contest_db"
mysql_username="contest_userw"
mysql_password='Okf8#$el'

4. Create a cron job to execute this task every half and hour. Open /var/spool/cron/root and add following line:

*/30 * * * * /bin/sh /home/scripts/mysqlbackup_30min

5. Restart cron daemon:

$ service crond restart

Done. You should see something like below after one day:

$ tree /backup/mysql
.
|-- 2011-11-14
|   |-- contest_db-1930.sql
|   |-- contest_db-2000.sql
|   |-- contest_db-2030.sql
|   |-- contest_db-2100.sql
|   |-- contest_db-2130.sql
|   |-- contest_db-2200.sql
|   |-- contest_db-2230.sql
|   |-- contest_db-2300.sql
|   `-- contest_db-2330.sql
`-- 2011-11-15
    |-- contest_db-0030.sql
    |-- contest_db-0100.sql
    |-- contest_db-0130.sql
    |-- contest_db-0200.sql
    |-- contest_db-0230.sql
    |-- contest_db-0300.sql
    |-- contest_db-0330.sql
    |-- contest_db-0400.sql
    |-- contest_db-0430.sql
    |-- contest_db-0500.sql
    |-- contest_db-0530.sql
    |-- contest_db-0600.sql
    |-- contest_db-0630.sql
    |-- contest_db-0700.sql
    |-- contest_db-0730.sql
    |-- contest_db-0800.sql
    |-- contest_db-0830.sql
    |-- contest_db-0900.sql
    |-- contest_db-0930.sql
    |-- contest_db-1000.sql
    |-- contest_db-1030.sql
    |-- contest_db-1100.sql
    |-- contest_db-1130.sql
    `-- contest_db-1200.sql

Install Pound as Web Load Balancer

One of our website host contents which can be downloaded by clients. Since this content is getting popular, our bandwidth usage and server load are getting higher everyday and we need something to balance this out. To solve this issue, I have using an application called Pound, which is a reverse proxy, load balancer, auto HTTP fail-over and HTTPS front-end for Web server.

The installation and implementation is very simple with help of RPMforge. Pound is open-source software which you can find out more information about it at http://www.apsis.ch/pound/ .

I will use a simple kind of architecture as below:

Lets assume I already have the website hosted in both servers, which I already copied using cPanel copy account tools. Variable as below:

OS: CentOS 5.6 64bit
Website URL: http://download.myinstaller.org/

1. Install RPMforge repository for yum. Go to http://wiki.centos.org/AdditionalResources/Repositories/RPMForge and choose which CentOS operating system you used. In this case, I will use CentOS 5 64bit packages:

$ cd /usr/local/src
$ wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm
$ rpm --import http://apt.sw.be/RPM-GPG-KEY.dag.txt 
$ rpm -Uhv rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm

2. Install Pound via yum:

$ yum install pound -y

3. Rename original config file because we will use our configuration file:

$ mv /etc/pound.cfg /etc/pound.cfg.ori
$ touch /etc/pound.cfg

4. Copy and paste following configuration into /etc/pound.cfg using text editor:

User            "nobody"
Group           "nobody"
 
LogLevel        1
Alive           2
 
ListenHTTP
        Address 109.169.80.202
        Port    80
End
 
Service
        HeadRequire "Host: .*download.myinstaller.org.*"
 
        BackEnd
                Address 109.169.80.204
                Port    80
		TimeOut 300
        End
 
        BackEnd
                Address 212.24.47.68
                Port 80
		TimeOut 300
        End
 
        Session
                Type Cookie
                ID   "JSESSIONID"
                TTL  300
        End
End

5. Make sure Pound is working at startup service and started:

$ chkconfig pound on
$ service pound start

6. Check whether Pound is listening to the correct value:

$ netstat -tulpn | grep pound
tcp        0      0 109.169.80.202:80               0.0.0.0:*                   LISTEN      29358/pound

7. To check the log files, you can monitor /var/log/messages using following command:

$ tail -f /var/log/messages

8. Last step is to point the subdomain to the load balancer IP. In this case, I need to login to my domain name server and change the A record for this subdomain. Example as below:

download.myinstaller.org     A      109.169.80.202

After DNS propagation is completed, you should notice that Pound has started to work and balancing your website!

Linux: Encrypt and Decrypt Files using GPG

Do you have any files in the server that hold some sensitive information and you only want to open it using a password? That is where GPG helps. GPG stands for GNU Privacy Guard, a tools to encrypt and sign your data and communication, features a versatile key management system as well as access modules for all kinds of public key directories.

In my case, I need to secure one file, which used to store login information for all users inside LDAP. This file will be used by me as reference to reset back the password if user has forgotten. The file name called user_ldap.txt under /home/admin directory.

Since this server also have another sudo users, I will need to secure this file to only who have the password are able to execute and read the files. So I will need to sign the file with GPG and remove the original file. If anyone wants to retrieve the file in the future, they need to enter some password and the original file will be generated.

The file that I want to secure contains following information:

## User credentials for LDAP
 
Username: mary
Password: M4ksjd(&&s
Department: Accounting
Directory: /home/accounting/mary
 
Username: ismael
Password: Lpo23S%#s
Department: Technical
Directory: /home/technical/ismael
 
Username: sazzy
Password: T5sZ&d#R
Department: Human Resource
Directory: /home/hr/sazzy

 

1. DO NOT LOGIN AS ROOT. At this stage, I login as user admin. Lets sign the file using GPG:

$ cd /home/admin
$ gpg -c user_ldap.txt

You will see something like below. Just enter required information to complete the signing:

$ gpg -c user_ldap.txt
gpg: directory '/home/admin/.gnupg' created
gpg: new configuration file '/home/admin/.gnupg/gpg.conf' created
gpg: WARNING: options in '/home/admin/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring '/home/admin/.gnupg/pubring.gpg' created
cant connect to '/home/admin/.gnupg/S.gpg-agent': No such file or directory
gpg-agent[17211]: directory '/home/admin/.gnupg/private-keys-v1.d' created

 

2. Remove the original file:

$ rm -Rf user_ldap.txt

 

3. Retrieve the original file by entering password to the signed file (user_ldap.txt.gpg):

$ gpg user_ldap.txt.gpg
gpg: keyring '/home/admin/.gnupg/secring.gpg' created
gpg: 3DES encrypted data
cant connect to '/home/admin/.gnupg/S.gpg-agent': No such file or directory
gpg: encrypted with 1 passphrase
gpg: WARNING: message was not integrity protected

 

4. Read data inside:

$ cat user_ldap.txt

You should able to see the original content of the file as above. Cheers!

Mount Same Partition in Different Servers (using Cluster)

In this tutorial, I will show you on how to mount a same partition in different servers. We will use RedHat Cluster Suite which available in CentOS repository, with GFS2 file system. The server architecture will be as below:

The file server is run on Openfiler, and we will use iSCSI initiator to mount the hard disk in both nodes. I assume that you have one partition called ‘data’ created in Openfiler as iSCSI target and mysql related package has been installed. If you not have one, you can refer to this post: Create iSCSI Target in OpenFiler .

1. Make sure all servers /etc/hosts has following value:

192.168.100.1   openfiler.cluster.local openfiler
192.168.100.11  node1.cluster.local    node1
192.168.100.12  node2.cluster.local    node2

2. Since node1 will be the head node in cluster, we will need to install following group packages via yum:

$ yum groupinstall -y "High Availability" "High Availability Management" "Resilient Storage"
$ yum install -y iscsi-initiator-utils openssl

3. In node2, we need to install similar things except “High Availability Management” group package:

$ yum groupinstall -y "High Availability" "Resilient Storage"
$ yum install -y iscsi-initiator-utils openssl

4. We need to allow certain ports so all nodes can communicate correctly. You need to add following line into /etc/sysconfig/iptables before any REJECT line (usually before the last 3 lines) in both nodes:

-A INPUT -m state --state NEW -m tcp -p tcp --dport 11111 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 21064 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 16851 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 8084 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 5404 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 5405 -j ACCEPT

Restart and save the iptables:

$ service iptables restart
$ service iptables save
$ setenforce 0

5. Make sure SElinux is disabled. Edit /etc/sysconfig/selinux and change following value:

SELINUX=disabled

6. Lets start all required services to run cluster in node1 and node2. Luci should only be started at node1 because it is the head node:

$ chkconfig luci on #only node1
$ service luci start #only node1
$ chkconfig ricci on
$ service ricci start
$ chkconfig iscsi on

7. We need to create ricci password so luci can communicate between cluster nodes. In this case, I will use the same password as root password for ricci:

$ passwd ricci

8. Lets create the cluster using RedHat Cluster Suite. Access the luci web management portal at https://192.168.100.11:8084 and login using root username and password. After login, go to Manage Clusters > Create > and enter the cluster name including nodes information. Example as below:

8. Wait for sometimes for cluster suite to be initialize in both nodes. After that, you should see that FileStorage indicator turn to green and you can list out all the cluster members as example below:

9. Cluster setup completed. Now we need to initialize iSCSI in node1. To discover iSCSI targets, run following command:

$ iscsiadm -m discovery -t sendtargets -p openfiler.cluster.local
Starting iscsid:                                           [  OK  ]
192.168.100.1:3260,1 iqn.2006-01.local.cluster.openfiler:data

10.  If you see some result as above, means we can see and get connected to the iSCSI target. We just need to do another restart so we can access the target:

$ service iscsi restart

11. In this server, I found out a new partition which is /dev/sdb. This is the iSCSI disk being discovered previously. We need to create one partition which is  /dev/sdb1 in this disk:

$ fdisk /dev/sdb

Sequence during fdisk: n > p > 1 > enter > enter > w

12. Now we need to format the partition with GFS2 file system. Command as below:

$ mkfs.gfs2 -p lock_dlm -t FileStorage:data -j 4 /dev/sdb1
Are you sure you want to proceed? [y/n] y
 
Device:                    /dev/sdb1
Blocksize:                 4096
Device Size                47.66 GB (12492796 blocks)
Filesystem Size:           47.66 GB (12492794 blocks)
Journals:                  4
Resource Groups:           191
Locking Protocol:          "lock_dlm"
Lock Table:                "FileStorage:data"
UUID:                      1A018632-7752-DAC0-DCEC-8C27E60C47E7

13. Lets create the mount point:

$ mkdir -p /storage/data

14. Edit the /etc/fstab by add following line with UUID and mount the GFS2 partition:

UUID=1A018632-7752-DAC0-DCEC-8C27E60C47E7 /storage/data gfs2 noatime,nodiratime  0 0

Mount the file system from GFS2 init service:

$ chkconfig gfs2 on
$ service gfs2 start

15. Now in node2, we just need to find the iSCSI target as step 9 and 10 and mount the device as step 13 and 14 afterwards.

Done! You can now see both directory (/storage/data) is sync together. Any new file being created in this directory will appear in both servers. You can use this simple cluster setup as the root directory of FTP server, NFS or Samba file sharing. If node1 is down, there is node2 available to take over the task. You also can add node3 or additional server easily to be the front-end.

If I have time, I will show you on how to do auto-failover using RedHat Cluster Suite. Cheers!

Resync MySQL Master/Slave Replication

I have encounter following error from the MySQL replication monitoring alert:

Error 'Duplicate entry '72264-4-10-2011' for key 1' on query. Default database: 'grad_data'. Query: 'INSERT INTO tblusr_log
                                          ( ID,UserType,Stats,Month,Year )
                                          VALUES
                                          ( '72264','4',1,MONTH(NOW()),YEAR(NOW()))'

This situation stopped the data replication process, which cause database in slave server fall behind and not syncing with the database at master server. There are several way to solve this issue:

Skip the error once and for all

1. Stop slave:

mysql> STOP SLAVE;

2. Add following line into /etc/my.cnf under [mysqld] directive and restart slave replication:

slave-skip-errors = 1062

3. Start slave:

mysql> START SLAVE;

Skip the error one by one

1. Run following command in mysql console to skip once:

mysql> STOP SLAVE; SET GLOBAL sql_slave_skip_counter = 1; START SLAVE;

2. Check again the mysql slave status:

mysql> SHOW SLAVE STATUS\G

If the error appear again, repeat step 1 and step 2 accordingly until the ‘Slave_SQL_Running‘ turn to ‘Yes’. You also can use following command to monitor the replication status via console or SSH:

$ watch -n1 'mysql -e "SHOW SLAVE STATUS\G"'

Re-sync back the database and re-start replication

Here is the recommended way. It safest and reliable enough to make sure all data are replicated correctly. This steps takes longer time but it worth to do. We will need to dump the database in master server, restore it in the new server and re-start the replication. Variable as below:

Master server IP: 192.168.11.2
Slave server IP: 192.168.11.3
Database name: grad_data
MySQL database user: grad_user
MySQL database password: [email protected]
MySQL slave user: slaver
MySQL slave password:  slaverp4ss

1. Dump the database in master server:

$ mysqldump -u grad_user -p"[email protected]" --master-data --single-transaction grad_data > /root/grad_data.master.sql

2. Copy it over to the slave server using scp:

$ scp /root/grad_data.master.sql root@192.168.11.3:/root

3. Recreate the database in slave server:

mysql> DROP DATABASE grad_data;
mysql> CREATE DATABASE grad_data;

4. Restore back the database in slave server:

$ mysql -u grad_user -p"[email protected]" grad_data < /root/grad_data.master.sql

5. Review the binlog file and position in the sqldump file. Open /root/grad_data.master.sql using text editor and find line similar to below:

CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000019', MASTER_LOG_POS=1017207641;

You need to take note the log_file name and log_pos number.

6. Restart the slave replication (make sure the MASTER_LOG_FILE and MASTER_LOG_POS in command below same as value in step 5):

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_HOST='192.168.11.2', MASTER_PORT=3306, MASTER_USER='slaver', MASTER_PASSWORD='slaverp4ss', MASTER_LOG_FILE='mysql-bin.000019', MASTER_LOG_POS=1017207641;
mysql> START SLAVE;

Verify slave status

Check the slave replication status:

 mysql> SHOW SLAVE STATUS\G

You should see something like below to indicate that your MySQL replication is running fine:

*************************** 1. row ***************************
   Slave_IO_Running: Yes
   Slave_SQL_Running: Yes

Linux: Email Alert on MySQL Replication Failure

In my environment, MySQL replication is really important because we are splitting different web servers with different database server to balance the load between MySQL servers. It quite traditional ways because this is kind of old database servers which sustain until today.

There is simple way to setup a monitoring script to alert us via email on any replication error. This script need to be run on slave server of MySQL replication.

Variable as below:

Database server: MySQL 5.0.77
Hostname: mysql.mydomain.org
Database host: localhost
Database name: mymarket_data
Database user: root
Database password: [email protected]

1. Create directory and file for the script:

$ mkdir -p /root/scripts
$ touch /root/scripts/check_slave.php

2. Copy and paste following PHP scripts and put it into a file called check_slave.php:

<?php
/**
 * Description: This script checks the slave status on the configured host
 *              printing "BAD <description>" or "OK <description>" for failure
 *              or success respectively.
 *              The possible failures could include:
 *              1) connection failure
 *              2) query failure (permissions, network, etc.)
 *              3) fetch failure (???)
 *              4) slave or io thread is not running
 *              5) Unknown master state (seconds_behind_master is null)
 *              6) seconds_behind_master has exceeded the configured threshold
 *
 *              If none of these condition occur, we asssume success and return
 *              an "OK" response, otherwise we include the error we can find
 *              (mysqli_connect_error() or $mysqli->error, or problem
 *               description).  A monitoring system need only check for:
 *              /^BAD/ (alert) or /^OK/ (everybody happy)
 */
 
/* **************************
 * Change related value below
 * **************************
 */
    $host = "";
    $user = "";
    $pass = "";
    $mailto = ""; // Your email address
    $mailsubject = "";
    $mailfrom = "";
 
/* ******************************************
 * No need to change anything below this line
 * ******************************************
 */ $mailmessage = "BAD: ".$err_msg."\n";
    $mailheaders = "From:" . $mailfrom;
    error_reporting(E_ALL);
    header("Content-Type: text/plain"); # Not HTML
    $sql = "SHOW SLAVE STATUS";
    $skip_file = 'skip_alerts';
    $link = mysql_connect($host, $user, $pass, null);
 
    if($link)
        $result = mysql_query($sql, $link);
    else {
        printf("BAD: Connection Failed %s", mysql_error());
        mysql_close($link);
        return;
    }
 
    if($result)
        $status = mysql_fetch_assoc($result);
    else {
        printf("BAD: Query failed - %s\n", mysql_error($link));
        mysql_close($link);
        return;
    }
 
    mysql_close($link);
 
    $slave_lag_threshold = 120;
 
    $tests = array(
        'test_slave_io_thread' => array('Slave_IO_Running', "\$var === 'Yes'",
                                        'Slave IO Thread is not running'),
        'test_slave_sql_thread' => array('Slave_SQL_Running', "\$var === 'Yes'",
                                        'Slave SQL Thread is not running'),
        'test_last_err' => array('Last_Errno', "\$var == 0",
                                 "Error encountered during replication - "
                                 .$status['Last_Error']),
        'test_master_status' => array('Seconds_Behind_Master', "isset(\$var)",
                                        'Unknown master status (Seconds_Behind_Master IS NULL)'),
        'test_slave_lag' => array('Seconds_Behind_Master',
                                  "\$var < \$slave_lag_threshold",
                                  "Slave is ${status['Seconds_Behind_Master']}s behind master (threshold=$slave_lag_threshold)")
    );
 
    $epic_fail = false;
    if(is_file($skip_file))
        $epic_fail = false;
    else
    {
        foreach($tests as $test_name => $data) {
            list($field, $expr, $err_msg) = $data;
            $var = $status[$field];
            $val = eval("return $expr;");
            if(!$val) {
                mail($mailto,$mailsubject,$mailmessage,$mailheaders);
                $epic_fail = true;
            }
        }
    }
 
    if(!$epic_fail) {
        print "OK: Checks all completed successfully\n";
    }
?>

3. Change related value on following line:

/* **************************
 * Change related value below
 * **************************
 */
    $host = "localhost";
    $user = "root";
    $pass = "[email protected]";
    $mailto = "[email protected]"; // Your email address
    $mailsubject = "mysql.mydomain.org Replication Alert";
    $mailfrom = "[email protected]";

4. Setup cron to execute the script every 15 minutes:

*/15 * * * * /usr/bin/php -q /root/scripts/check_slave.php

5. Restart cron service:

$ service crond restart

Done. You should receive notification email if the master/slave replication failed!

 

Update

Following script is being modified by Joel Brock as stated in comment section below:

<!--?php 
/**
* Description: This script checks the slave status on the configured host
* printing "BAD ” or “OK ” for failure
* or success respectively.
* The possible failures could include:
* 1) connection failure
* 2) query failure (permissions, network, etc.)
* 3) fetch failure (???)
* 4) slave or io thread is not running
* 5) Unknown master state (seconds_behind_master is null)
* 6) seconds_behind_master has exceeded the configured threshold
*
* If none of these condition occur, we asssume success and return
* an “OK” response, otherwise we include the error we can find
* (mysqli_connect_error() or $mysqli--->error, or problem
* description). A monitoring system need only check for:
* /^BAD/ (alert) or /^OK/ (everybody happy)
*/
 
/* **************************
* Change related value below
* **************************
*/
$host = array(
// “cr-1″ => “192.168.0.81″,
// “cr-2″ => “192.168.0.82″,
// “cr-3″ => “192.168.0.83″,
// “cr-4″ => “192.168.0.84″,
// “cr-5″ => “192.168.0.85″,
// “jp-1″ => “192.168.100.81″,
// “jp-2″ => “192.168.100.82″,
// “jp-3″ => “192.168.100.83″,
// “jp-4″ => “192.168.100.84″,
// “jp-5″ => “192.168.100.85″,
“cr-test” => “192.168.0.87″
);
$user = “”;
$pass = “”;
$mailto = “”;
$mailfrom = “”;
 
/* ******************************************
* No need to change anything below this line
* ******************************************
*/
 
error_reporting(E_ALL);
header(“Content-Type: text/plain”); # Not HTML
foreach ($host as $key => $value) {
 
$mailsubject =[".$key."] SLAVE REPLICATION ALERT”;
$mailheaders = “From:. $mailfrom;
$sql = “SHOW SLAVE STATUS”;
$skip_file = ‘skip_alerts’;
$link = mysql_connect($value, $user, $pass, null);
 
if($link)
    $result = mysql_query($sql, $link);
else {
    printf(“BAD: Connection Failed %s”, mysql_error());
    mysql_close($link);
    return;
}
 
if($result)
    $status = mysql_fetch_assoc($result);
else {
    printf(“BAD: Query failed – %s\n”, mysql_error($link));
    mysql_close($link);
    return;
}
 
mysql_close($link);
 
$slave_lag_threshold = 120;
 
$tests = array(
    ‘test_slave_io_thread’ => array(‘Slave_IO_Running’, “\$var === ‘Yes’”,
    ‘Slave IO Thread is not running’),
    ‘test_slave_sql_thread’ => array(‘Slave_SQL_Running’, “\$var === ‘Yes’”,
    ‘Slave SQL Thread is not running’),
    ‘test_last_err’ => array(‘Last_Errno’, “\$var == 0,
    “Error encountered during replication – ”
    .$status['Last_Error']),
    ‘test_master_status’ => array(‘Seconds_Behind_Master’,isset(\$var),
    ‘Unknown master status (Seconds_Behind_Master IS NULL)),
    ‘test_slave_lag’ => array(‘Seconds_Behind_Master’,
    “\$var $data) {
        list($field, $expr, $err_msg) = $data;
        $var = $status[$field];
        $val = eval(return $expr;);
        $val1 = (!$val) ? $val1 + 1 : $val11;
        $mailmessage .= “BAD:. $key . ” replication failed. Reason:. $err_msg . “\n”;
        }
    if ($val1 > 0) {
        mail($mailto,$mailsubject,$mailmessage,$mailheaders);
        print $mailmessage . “\n”;
        $epic_fail = true;
        }
    }
 
    if(!$epic_fail) {
        print “OK: Checks all completed successfully on [".$key."]\n”;
    }
}
?>