MySQL: Export Table to Splitted CSV using Stored Procedure

I have been working with Amazon Redshift lately and I need to export some huge tables (hundred millions of rows) to load them into Redshift cluster. Since Redshift supports parallel bulk loading of data files (CSV, TSV, Json), I took advantage of this feature by splitting the MySQL CSV data files before upload them to S3 for data loading. The only problem was I need to split the records evenly for the selected tables.

So, I have came out with following stored procedure. Let’s say we have two tables in MySQL that we want to export; activity_log and messaging_log.

To export and split, copy following stored procedure lines into MySQL:

DELIMITER //
DROP PROCEDURE IF EXISTS export_csv_split //
 
CREATE PROCEDURE export_csv_split (IN table_name VARCHAR(50), IN rows INT)
BEGIN
DECLARE x INT;
DECLARE y INT;
DECLARE total_row INT;
 
SET x=0;
SET y=1;
 
-- count the total rows of the select statement as @total_row
SET @SQLString1 = CONCAT('SELECT COUNT(*) INTO @total_row FROM ',table_name);
PREPARE test1 FROM @SQLString1;
EXECUTE test1;
 
WHILE x <= @total_row DO
 
-- export the rows from select statement with limit
SET @SQLString = CONCAT('SELECT * FROM ',table_name,' LIMIT ',x,',',rows,' INTO OUTFILE "/tmp/',table_name,'-',y,'.csv" FIELDS TERMINATED BY '','' OPTIONALLY ENCLOSED BY ''"'' LINES TERMINATED BY ''\n''');
PREPARE test2 FROM @SQLString;
EXECUTE test2;
 
SET x=x+rows;
SET y=y+1;
 
END WHILE;
 
END //
DELIMITER ;

Then, call the stored procedure and pass the table name and the number of rows per file, 10M rows per file for table activity log and 5M rows per file for table messaging_log:

mysql> call export_csv_split(activity_log,10000000);
mysql> call export_csv_split(messaging_log,5000000);

The data files will be generated under /tmp directory:

ls -1 /tmp
activity_log-1.csv
activity_log-2.csv
activity_log-3.csv
activity_log-4.csv
activity_log-5.csv
activity_log-6.csv
...

 

For another case, some huge tables need to be exported with conditions; export all records which have been created for the last 10 years:

DELIMITER //
DROP PROCEDURE IF EXISTS export_csv_split_condition //
 
CREATE PROCEDURE export_csv_split_condition (IN table_name VARCHAR(50), IN rows INT)
BEGIN
DECLARE x INT;
DECLARE y INT;
DECLARE total_row INT;
 
SET x=0;
SET y=1;
 
-- condition of the select statement
SET @condition = CONCAT('WHERE DATE(created) < ''2014-01-01'' AND DATE(created) >= ''2004-01-01''');
 
-- count the total rows of the select statement + condition as @total_row
SET @SQLString1 = CONCAT('SELECT COUNT(*) INTO @total_row FROM ',table_name,' ',@condition);
PREPARE test1 FROM @SQLString1;
EXECUTE test1;
 
WHILE x <= @total_row DO
 
-- export the rows from select statement + condition with limit
SET @SQLString = CONCAT('SELECT * FROM ',table_name,' ',@condition,' LIMIT ',x,',',rows,' INTO OUTFILE "/tmp/',table_name,'-',y,'.csv" FIELDS TERMINATED BY '','' OPTIONALLY ENCLOSED BY ''"'' LINES TERMINATED BY ''\n''');
PREPARE test2 FROM @SQLString;
EXECUTE test2;
 
SET x=x+rows;
SET y=y+1;
 
END WHILE;
 
END //
DELIMITER ;

Similar to the first case, just call the stored procedure as below:

mysql> call export_csv_split_condition('tbl_user_activities',1000000);

You may need to change the COUNT(*) to single column lookup instead, e.g: COUNT(id) to speed up the rows counting process in InnoDB/XtraDB storage engine. Hope this simple sharing can help you a lot on exporting huge tables to CSV.

Install Puppet on CentOS, Redhat, Ubuntu, Debian

Lately, I have been spending my time doing research on developing a Puppet module. As you might know, Puppet is an orchestration tool used by many sysadmins to deploy and configure servers without hassle on repeating the same installation commands over and over again.

With Puppet you just simply:

  1. Install Puppet master
  2. Define node configuration in Puppet master
  3. Install Puppet agent
  4. Let the Puppet agent deploy what you have defined

Developing Puppet module requires me to test the module’s manifest on many popular OS distributions out there. I am going to stick with the subject of this post on installing Puppet on following OS distributions:

  • RHEL 6/CentOS 6
  • RHEL 5/CentOS 5
  • Ubuntu 12.04
  • Ubuntu 14.04
  • Debian 6
  • Debian 7

 

Requirement

Ensure the host’s date and time is synced through ntp and /etc/hosts is configured correctly. Following is the example of /etc/hosts definition that I used:

192.168.10.100    puppetmaster.local
192.168.10.101    mysql1.local   # puppet-agent
192.168.10.102    mysql2.local   # puppet-agent
192.168.10.103    mysql3.local   # puppet-agent

** The /etc/hosts must be same on all nodes so hostname can be resolved to an IP. This is required later during the certificate signing stage by puppet master.

RHEL-based:

yum install -y ntpdate
ntpdate -u my.pool.ntp.org

Debian-based:

sudo apt-get install -y ntpdate
ntpdate -u my.pool.ntp.org

Installing Puppetlabs Repository

We’ll need to install official Puppetlabs repository on all nodes regardless of their role in puppet for the installation purposes. Install the repository definition on respective OS distribution:

RHEL 6/CentOS 6:

rpm -ivh http://yum.puppetlabs.com/puppetlabs-release-el-6.noarch.rpm

RHEL 5/CentOS 5:

rpm -ivh http://yum.puppetlabs.com/puppetlabs-release-el-5.noarch.rpm

Ubuntu 12.04 (Precise):

wget https://apt.puppetlabs.com/puppetlabs-release-precise.deb
sudo dpkg -i puppetlabs-release-precise.deb

Ubuntu 14.04 (Trusty):

wget https://apt.puppetlabs.com/puppetlabs-release-trusty.deb
sudo dpkg -i puppetlabs-release-trusty.deb

Debian 6 (Squeeze):

wget https://apt.puppetlabs.com/puppetlabs-release-squeeze.deb
dpkg -i puppetlabs-release-squeeze.deb

Debian 7 (Wheezy):

wget https://apt.puppetlabs.com/puppetlabs-release-wheezy.deb
dpkg -i puppetlabs-release-wheezy.deb

 

Installing Puppet Master

On puppetmaster.local node, run following command to install Puppet master:

Redhat-based:

yum install -y puppet-server openssl

Debian-based:

sudo apt-get update
sudo apt-get install -y puppetmaster openssl

 

Installing Puppet Agent

On all puppet agent nodes (mysql1.local, mysql2.local, mysql3.local), install puppet agent and its dependencies:

Redhat-based:

yum install -y puppet facter openssl

Debian-based:

sudo apt-get update
sudo apt-get install -y puppet facter openssl

 

Signing the Certificate Authority

Puppet communicates through a secured channel with SSL. When puppet agent runs for the first time, it will auto-generate a new SSL and puppet master must sign it before all the communications begin. Run following command on each agent node:

$ puppet agent --server=puppetmaser.local --no-daemonize --verbose
Info: Creating a new SSL key for mysql1.local
Info: Caching certificate for ca
Info: csr_attributes file loading from /etc/puppet/csr_attributes.yaml
Info: Creating a new SSL certificate request for ccpuppet.local
Info: Certificate Request fingerprint (SHA256): 6F:8B:92:46:B0:3F:04:0A:4F:8D:BD:56:77:24:77:50:1C:E9:F4:EE:C6:00:5E:82:4F:B0:85:B5:26:72:43:E0
Info: Caching certificate for ca

This will generate a certificate to be signed by the puppet master. Now in the puppet master, list the certificate authority (CA):

$ puppet ca list
mysql1.local  (SHA256) 6F:8B:92:46:B0:3F:04:0A:4F:8D:BD:56:77:24:77:50:1C:E9:F4:EE:C6:00:5E:82:4F:B0:85:B5:26:72:43:E0

Sign the CA for this agent:

$ puppet ca sign mysql1.local
Notice: Signed certificate request for mysql1.local
Notice: Removing file Puppet::SSL::CertificateRequest ccpuppet.local at '/var/lib/puppet/ssl/ca/requests/mysql1.local.pem'
"-----BEGIN CERTIFICATE-----\n
...the key..."

Repeat the above steps on the other nodes, mysql2.local and mysql3.local. Now the puppet master should able to communicate with its agents securely.

 

Configure Puppet Agent

The last step is to update /etc/puppet/puppet.conf and add following line under [main] directive:

server=puppetmaster.local

** If you do not configure as above, you will need to add –server=puppetmaster.local on each of the puppet agent command below.

Now you can test from the agent node:

puppet agent --test

 

Deploy MySQL through Puppet

Puppet is now ready. Let’s deploy a mysql server with the simplest way. Go to Puppet Forge and look for a puppet module called puppetlabs-mysql. To install this module, run following command on to the puppet master node:

puppet module install puppetlabs-mysql

On the puppetmaster.local, create a puppet manifest to define how the agent should deploy at /etc/puppet/manifests/site.pp:

# /etc/puppet/manifests/site.pp
 
# Default node - this is compulsory
node "default" {
}
 
# Define the agent nodes
node "mysql1.local", "mysql2.local", "mysql3.local" {
 class { '::mysql::server':
 root_password => 'strongpassword'
 }
 class { '::mysql::client':
 package_ensure => 'present'
 }
}

Above definition will tell Puppet to install a MySQL server and client package on mysql1.local, mysql2.local and mysql3.local. To immediately start the deployment, go to the agent node and run:

puppet agent --test

Or, wait for the Puppet agent service to apply the catalog automatically (depending on the runinterval value, default is 30 minutes).

Once done, you will have three MySQL servers ready to serve! Imagine how much time you would save if you have many servers, applications, softwares and configurations to maintain.

Galera Cluster – Simpler way to view GRA file content

Galera Cluster (MySQL from Codership, Percona XtraDB Cluster, MariaDB Galera Cluster) generates a GRA log files if it fails to apply the writeset on the target node. This files exists in the MySQL data directory. You can get an overview of the file (if exist) by listing your MySQL data directory (in my case, the data directory is at /var/lib/mysql):

$ ls -1 /var/lib/mysql | grep GRA
GRA_10_104865779.log
GRA_13_104865781.log
GRA_5_104865780.log

MySQL Performance Blog has covered this topic in well-explained. I’m going to make this simple. Download the script here and copy it to your /usr/bin directory:

wget http://blog.secaserver.com/files/grareader -P /usr/bin/
chmod 755 /usr/bin/grareader

Just run following command to simply convert the GRA log file to a human-readable output:

grareader [gra_log_file]

Here is the example output:

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @[email protected]@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#140114 3:12:42 server id 3 end_log_pos 120 Start: binlog v 4, server v 5.6.15-log created 140114 3:12:42 at startup
ROLLBACK/*!*/;
BINLOG '
qjrUUg8DAAAAdAAAAHgAAAAAAAQANS42LjE1LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAACqOtRSEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAf73
8eY=
'/*!*/;
# at 120
#140114 3:12:43 server id 3 end_log_pos 143 Stop
# at 143
#140507 14:55:42 server id 4 end_log_pos 126 Query thread_id=3173489 exec_time=0 error_code=0
use `test_shop`/*!*/;
SET TIMESTAMP=1399445742/*!*/;
SET @@session.pseudo_thread_id=3173489/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
ALTER TABLE `tblreshipment_header` DROP `ShipmentStatus`
/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET [email protected]_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

You can download the script here or copy and paste the code:

#!/bin/bash
# Convert Galera GRA_* files to human readable output
# Usage: grareader 
# Example: grareader /var/lib/mysql/GRA_1_1.log
 
##
## GRA header file path
##
path=/tmp
gra_header_path=$path/GRA-Header
tmp_path=$path/grareader.tmp
 
input=$1
[ ! -e $input ] && echo 'Error: File does not exist' && exit 1
 
get_gra_header()
{
        download_url='http://blog.secaserver.com/files/GRA-Header'
        wget_bin=`command -v wget`
        [ -z "$wget_bin" ] && echo 'Error: Unable to locate wget. Please install it first' && exit 1
        echo "Downloadling GRA-Header file into $path"
        $wget_bin --quiet $download_url -P $path
        [ $? -ne 0 ] && echo 'Error: Download failed' && exit 1
}
 
locate_files()
{
        mysqlbinlog_bin=`command -v mysqlbinlog`
        [ -z "$mysqlbinlog_bin" ] && echo 'Error: Unable to locate mysqlbinlog binary. Please install it first' && exit 1
        [ ! -e $gra_header_path ] && echo 'Error: Unable to locate GRA header file' && get_gra_header
 
}
 
locate_files
 
cat $gra_header_path >> $tmp_path
cat $input >> $tmp_path
 
echo ''
clear
$mysqlbinlog_bin -v -v -v $tmp_path
echo ''
rm -rf $tmp_path

 

Hope this could help make your Galera administrative task simpler!

Percona Server Installation Error – libssl.so.10 and libcrypto.so.10

I stumbled upon one error when installing Percona Server and socat via yum repository with following error:

--> Processing Dependency: libssl.so.10(libssl.so.10)(64bit) for package: socat-1.7.2.3-1.el6.x86_64
--> Processing Dependency: libcrypto.so.10(libcrypto.so.10)(64bit) for package: socat-1.7.2.3-1.el6.x86_64

It turns out that:

“Red Hat upgraded the version of OpenSSL in EL6 from 1.0.0 to 1.0.1 during the 6.4-6.5 cycle, in order to resolve a years-old feature request. This package is no longer binary compatible, and programs that were built against OpenSSL 1.0.0 must be rebuilt from source against 1.0.1.”

What do we need to do then?

Luckily the package is available at IUS repository:

rpm -Uhv http://dl.iuscommunity.org/pub/ius/stable/Redhat/6/x86_64/epel-release-6-5.noarch.rpm
rpm -Uhv http://dl.iuscommunity.org/pub/ius/stable/Redhat/6/x86_64/ius-release-1.0-13.ius.el6.noarch.rpm
yum install yum-plugin-replace
yum replace --enablerepo=ius-archive openssl --replace-with openssl10

(Press y for any question)

Verify the dependent libraries are exist (libcrypto.so.10 and libssl.so.10) :

ls /usr/lib64/ | grep -e ssl.so -e crypto.so
libcrypto.so.10
libcrypto.so.1.0.1e
libssl3.so
libssl.so.10
libssl.so.1.0.1e

Then, try again with the Percona Server installation command. In some cases, you might need to remove the installed epel-release package since it’s a little bit outdated with the current release 6.8.

Hope the solution will help you guys out there!