MySQL: Export Table to Splitted CSV using Stored Procedure

I have been working with Amazon Redshift lately and I need to export some huge tables (hundred millions of rows) to load them into Redshift cluster. Since Redshift supports parallel bulk loading of data files (CSV, TSV, Json), I took advantage of this feature by splitting the MySQL CSV data files before upload them to S3 for data loading. The only problem was I need to split the records evenly for the selected tables.

So, I have came out with following stored procedure. Let’s say we have two tables in MySQL that we want to export; activity_log and messaging_log.

To export and split, copy following stored procedure lines into MySQL:

DELIMITER //
DROP PROCEDURE IF EXISTS export_csv_split //
 
CREATE PROCEDURE export_csv_split (IN table_name VARCHAR(50), IN rows INT)
BEGIN
DECLARE x INT;
DECLARE y INT;
DECLARE total_row INT;
 
SET x=0;
SET y=1;
 
-- count the total rows of the select statement as @total_row
SET @SQLString1 = CONCAT('SELECT COUNT(*) INTO @total_row FROM ',table_name);
PREPARE test1 FROM @SQLString1;
EXECUTE test1;
 
WHILE x <= @total_row DO
 
-- export the rows from select statement with limit
SET @SQLString = CONCAT('SELECT * FROM ',table_name,' LIMIT ',x,',',rows,' INTO OUTFILE "/tmp/',table_name,'-',y,'.csv" FIELDS TERMINATED BY '','' OPTIONALLY ENCLOSED BY ''"'' LINES TERMINATED BY ''\n''');
PREPARE test2 FROM @SQLString;
EXECUTE test2;
 
SET x=x+rows;
SET y=y+1;
 
END WHILE;
 
END //
DELIMITER ;

Then, call the stored procedure and pass the table name and the number of rows per file, 10M rows per file for table activity log and 5M rows per file for table messaging_log:

mysql> call export_csv_split(activity_log,10000000);
mysql> call export_csv_split(messaging_log,5000000);

The data files will be generated under /tmp directory:

ls -1 /tmp
activity_log-1.csv
activity_log-2.csv
activity_log-3.csv
activity_log-4.csv
activity_log-5.csv
activity_log-6.csv
...

 

For another case, some huge tables need to be exported with conditions; export all records which have been created for the last 10 years:

DELIMITER //
DROP PROCEDURE IF EXISTS export_csv_split_condition //
 
CREATE PROCEDURE export_csv_split_condition (IN table_name VARCHAR(50), IN rows INT)
BEGIN
DECLARE x INT;
DECLARE y INT;
DECLARE total_row INT;
 
SET x=0;
SET y=1;
 
-- condition of the select statement
SET @condition = CONCAT('WHERE DATE(created) < ''2014-01-01'' AND DATE(created) >= ''2004-01-01''');
 
-- count the total rows of the select statement + condition as @total_row
SET @SQLString1 = CONCAT('SELECT COUNT(*) INTO @total_row FROM ',table_name,' ',@condition);
PREPARE test1 FROM @SQLString1;
EXECUTE test1;
 
WHILE x <= @total_row DO
 
-- export the rows from select statement + condition with limit
SET @SQLString = CONCAT('SELECT * FROM ',table_name,' ',@condition,' LIMIT ',x,',',rows,' INTO OUTFILE "/tmp/',table_name,'-',y,'.csv" FIELDS TERMINATED BY '','' OPTIONALLY ENCLOSED BY ''"'' LINES TERMINATED BY ''\n''');
PREPARE test2 FROM @SQLString;
EXECUTE test2;
 
SET x=x+rows;
SET y=y+1;
 
END WHILE;
 
END //
DELIMITER ;

Similar to the first case, just call the stored procedure as below:

mysql> call export_csv_split_condition('tbl_user_activities',1000000);

You may need to change the COUNT(*) to single column lookup instead, e.g: COUNT(id) to speed up the rows counting process in InnoDB/XtraDB storage engine. Hope this simple sharing can help you a lot on exporting huge tables to CSV.

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *