Resync MySQL Master/Slave Replication

I have encounter following error from the MySQL replication monitoring alert:

Error 'Duplicate entry '72264-4-10-2011' for key 1' on query. Default database: 'grad_data'. Query: 'INSERT INTO tblusr_log
                                          ( ID,UserType,Stats,Month,Year )
                                          VALUES
                                          ( '72264','4',1,MONTH(NOW()),YEAR(NOW()))'

This situation stopped the data replication process, which cause database in slave server fall behind and not syncing with the database at master server. There are several way to solve this issue:

Skip the error once and for all

1. Stop slave:

mysql> STOP SLAVE;

2. Add following line into /etc/my.cnf under [mysqld] directive and restart slave replication:

slave-skip-errors = 1062

3. Start slave:

mysql> START SLAVE;

Skip the error one by one

1. Run following command in mysql console to skip once:

mysql> STOP SLAVE; SET GLOBAL sql_slave_skip_counter = 1; START SLAVE;

2. Check again the mysql slave status:

mysql> SHOW SLAVE STATUS\G

If the error appear again, repeat step 1 and step 2 accordingly until the ‘Slave_SQL_Running‘ turn to ‘Yes’. You also can use following command to monitor the replication status via console or SSH:

$ watch -n1 'mysql -e "SHOW SLAVE STATUS\G"'

Re-sync back the database and re-start replication

Here is the recommended way. It safest and reliable enough to make sure all data are replicated correctly. This steps takes longer time but it worth to do. We will need to dump the database in master server, restore it in the new server and re-start the replication. Variable as below:

Master server IP: 192.168.11.2
Slave server IP: 192.168.11.3
Database name: grad_data
MySQL database user: grad_user
MySQL database password: [email protected]
MySQL slave user: slaver
MySQL slave password:  slaverp4ss

1. Dump the database in master server:

$ mysqldump -u grad_user -p"[email protected]" --master-data --single-transaction grad_data > /root/grad_data.master.sql

2. Copy it over to the slave server using scp:

$ scp /root/grad_data.master.sql root@192.168.11.3:/root

3. Recreate the database in slave server:

mysql> DROP DATABASE grad_data;
mysql> CREATE DATABASE grad_data;

4. Restore back the database in slave server:

$ mysql -u grad_user -p"[email protected]" grad_data < /root/grad_data.master.sql

5. Review the binlog file and position in the sqldump file. Open /root/grad_data.master.sql using text editor and find line similar to below:

CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000019', MASTER_LOG_POS=1017207641;

You need to take note the log_file name and log_pos number.

6. Restart the slave replication (make sure the MASTER_LOG_FILE and MASTER_LOG_POS in command below same as value in step 5):

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_HOST='192.168.11.2', MASTER_PORT=3306, MASTER_USER='slaver', MASTER_PASSWORD='slaverp4ss', MASTER_LOG_FILE='mysql-bin.000019', MASTER_LOG_POS=1017207641;
mysql> START SLAVE;

Verify slave status

Check the slave replication status:

 mysql> SHOW SLAVE STATUS\G

You should see something like below to indicate that your MySQL replication is running fine:

*************************** 1. row ***************************
   Slave_IO_Running: Yes
   Slave_SQL_Running: Yes

Linux: /tmp: Read-only file system Error

One of the server that I manage has problem as below when I want to edit some files in crontab:

$ crontab -e
/tmp/crontab.XXXX1ibTLU: Read-only file system

It shows that the /tmp partition is unwriteable. The read-only has been mounted as read-only because file-system facing some error. To fix this, we need to do file system check (fsck) for /tmp partition. Before we do fsck, we need to unmount the directory but following error occurred:

$ umount /tmp
/tmp: Device or resource busy

It seems like /tmp directory is locked to be unmounted due to some files are already in process/being opened/being executed by some other processes. Using lsof, we can list out all the open files:

$ lsof | grep /tmp
mysqld  2599 mysql    5u   REG    7,0    0 6098 /tmp/ibaqFhew (deleted)
mysqld  2599 mysql    6u   REG    7,0    0 6099 /tmp/ibC7Yfbn (deleted)
mysqld  2599 mysql    7u   REG    7,0    0 6100 /tmp/ibJ8AFbe (deleted)
mysqld  2599 mysql   11u   REG    7,0    0 6101 /tmp/ibrLO9t5 (deleted)

As we can see that mysqld is locking some temporary files in /tmp directory. The 2nd column shows PID of the locking process. We need to stop this process using kill command:

$ kill -9 2599

Only then we are able to unmount the /tmp:

$ umount /tmp

Make sure that there is no error being prompt during the unmounting process. Now we can proceed to do fsck with -f (force) and -y (always accept prompt as Yes) to automate the file system check process:

$ fsck -f -y /tmp
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
/usr/tmpDSK: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 6097 has zero dtime.  Fix? yes
 
Inodes that were part of a corrupted orphan linked list found.  Fix? yes
 
Inode 6098 was part of the orphaned inode list.  FIXED.
Inode 6099 was part of the orphaned inode list.  FIXED.
Inode 6100 was part of the orphaned inode list.  FIXED.
Inode 6101 was part of the orphaned inode list.  FIXED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Inode bitmap differences:  -(6097--6101)
Fix? yes
 
Free inodes count wrong for group #3 (2025, counted=2030).
Fix? yes
 
Free inodes count wrong (127695, counted=127700).
Fix? yes
 
/usr/tmpDSK: ***** FILE SYSTEM WAS MODIFIED *****
/usr/tmpDSK: 316/128016 files (3.2% non-contiguous), 66394/512000 blocks

Now the file system has been modified and fixed. We can remount back the partition using following command:

$ mount -a

You should able to use back the /tmp partition at this time, as well as I can do some changes on the crontab!

Linux: VMware Tools Installation Error

Yesterday, I was installing a new CentOS 6 server within VMWare Workstation 7 to be a file server for internal usage. As usual, it is highly recommended to install VMWare Tools to every guest operating system we installed to make sure the hardware integration will be smooth and avoid degraded performance on the virtual server.

During the installation process, I found following error:

Searching for a valid kernel header path...
The path "" is not valid.
Would you like to change it? [yes]
 
What is the location of the directory of C header files that match your running kernel?

What VMWare needs is the kernel-header and kernel-devel package from the same version with your current loaded kernel. You can check your loaded kernel by using following command:

$uname -r
2.6.32-71.el6.x86_64

SOLUTION 1

Solution #1 is highly recommended because it is better to update your kernel to the latest stable version provided by the repository. But you need to have downtime on this. Steps as below:

1. Update the kernel:

$ yum update kernel -y

2. Install the kernel-headers, kernel-devel and other required packages:

$ yum install kernel-headers kernel-devel gcc make -y

3. Reboot the server to make sure it load to the new kernel:

$ init 6

4. The kernel version has been updated including the kernel-headers and kernel-devel:

$uname -r
2.6.32-71.29.1.el6.x86_64
$rpm -qa | grep -e kernel-headers -e kernel-devel
kernel-headers-2.6.32-71.29.1.el6.x86_64
kernel-devel-2.6.32-71.29.1.el6.x86_64

SOLUTION 2

Solution #2 require you to install kernel-headers and kernel-devel with your current kernel version. Steps as below:

1. Install the same version of kernel-headers and kernel-devel via yum:

$ yum install kernel-headers-$(uname -r) kernel-devel-$(uname -r) -y

NOTE: If you have install gcc previously, you will facing error because the kernel-headers already installed but the version is the latest kernel version. You need to remove it first using following command:

$ yum remove kernel-headers -y

2. Install required files:

$ yum install gcc make -y

3. No need to reboot the server. Just make sure the kernel, kernel-headers and kernel-devel version are same:

$ uname -r
2.6.32-71.el6.x86_64
$ rpm -qa | grep -e kernel-headers -e kernel-devel
kernel-headers-2.6.32-71.el6.x86_64
kernel-devel-2.6.32-71.el6.x86_64

Once you have completed using one of the solution above, proceed to the VMware tools installation by following the wizard. The installation should be able to detect the kernel header path at this moment.

cPanel: Berkeley DB error

I found out this error in /var/log/exim_mainlog. When trying to fix the Exim database using /scripts/exim_tidydb, the same error occurred:

$ tail -f /var/log/exim_mainlog
2011-09-08 10:08:13 1R1cV2-0003Yq-J2 Berkeley DB error: page 40: illegal page type or format
2011-09-08 10:08:13 1R1cV2-0003Yq-J2 Berkeley DB error: PANIC: Invalid argument
2011-09-08 10:08:13 1R1cV2-0003Yq-J2 Berkeley DB error: fatal region error detected; run recovery

This can be fixed by the following steps by re-updating Exim and clear up the exim

1. Backup /etc/exim.conf and /var/spool/exim/db :

$ cp /etc/exim.conf /etc/exim.conf.bak
$ cp /var/spool/exim/db /var/spool/exim/db.bak

2. Stop Exim:

$ service exim stop

3. Remove all files under /var/spool/exim/db to make sure we get the new Exim database :

$ rm -Rfv /var/spool/exim/db/*

4.Update Exim:

$ /scripts/eximup --force

5. Restore back the configuration of Exim:

$ cp /etc/exim.conf.bak /etc/exim.conf

6. Restart Exim to load our configuration:

$ service exim restart

Plesk – DNS error (Error code 80)

Error code:

-----------------------------------------------------------------
Set default component failed: defpackagemng failed: Execute dnsmng.exe REMOVE * failed: The file exists. (Error code 80) at GetTempFileName in C:\Program Files\Parallels\Plesk\admin\db
Execute dnsmng.exe UPDATE * failed: The file exists. (Error code 80) at GetTempFileName in C:\Program Files\Parallels\Plesk\admin\db
------------------------------------------------------------------

or

-----------------------------------------------------------------
Set default component failed: defpackagemng failed: Execute dnsmng.exe REMOVE * failed: The file exists. (Error code 80) at GetTempFileName in C:\Windows\Temp
Execute dnsmng.exe UPDATE * failed: The file exists. (Error code 80) at GetTempFileName in C:\Windows\Temp
------------------------------------------------------------------

Consequence:
DNS Server is switched off or unavailable to lookup DNS zone

Solution:
Remove the temporary files by navigate to correct path and run delete the Plesk temporary files. In this case, we will use CMD to solve error no 1:

cd "C:\Program Files\Parallels\Plesk\admin\db"
del /F /S /Q tmp*.tmp

Try adding back the DNS server by login into Plesk > Settings > Server Components > DNS Server, select BIND and click OK.