Convert CSV to JSON using BASH

I have been assigned a task to generate random data in JSON format. I do have a big data set ready in CSV (comma separated values) and would love to convert it to JSON just using BASH. You can copy following codes and save it as a executable script file.

#!/bin/bash
# CSV to JSON converter using BASH
# Usage ./csv2json input.csv > output.json
 
input=$1
 
[ -z $1 ] && echo "No CSV input file specified" && exit 1
[ ! -e $input ] && echo "Unable to locate $1" && exit 1
 
read first_line < $input
a=0
headings=`echo $first_line | awk -F, {'print NF'}`
lines=`cat $input | wc -l`
while [ $a -lt $headings ]
do
        head_array[$a]=$(echo $first_line | awk -v x=$(($a + 1)) -F"," '{print $x}')
        a=$(($a+1))
done
 
c=0
echo "{"
while [ $c -lt $lines ]
do
        read each_line
        if [ $c -ne 0 ]; then
                d=0
                echo -n "{"
                while [ $d -lt $headings ]
                do
                        each_element=$(echo $each_line | awk -v y=$(($d + 1)) -F"," '{print $y}')
                        if [ $d -ne $(($headings-1)) ]; then
                                echo -n ${head_array[$d]}":"$each_element","
                        else
                                echo -n ${head_array[$d]}":"$each_element
                        fi
                        d=$(($d+1))
                done
                if [ $c -eq $(($lines-1)) ]; then
                        echo "}"
                else
                        echo "},"
                fi
        fi
        c=$(($c+1))
done < $input
echo "}"

To perform the conversion, run the script with first argument is the CSV file that you want to convert to and redirect the output to an output file. Make sure the CSV file contains field names as the header, similar to example below:

name,modified,social_security
"Farrah Walters","208-72-8449","1386670785"
"Shay Warner","539-53-2690","1386644172"
"Maxine Norton","231-61-5065","1386658663"

Hope this could help others out there! You can download the script here.

CentOS: Install and Configure MongoDB Sharded Cluster

In this post I am going to deploy a MongoDB sharded cluster. MongoDB is an open-source NoSQL, document-oriented database designed for ease of development and scaling.  I am going to use 3 servers, and all the /etc/hosts definition would be as below:

192.168.0.41        mongo1 mongo1.cluster.local
192.168.0.42        mongo2 mongo2.cluster.local
192.168.0.43        mongo3 mongo3.cluster.local

All servers running CentOS 6.3 64bit with firewall and SElinux turned off. All steps must be executed in all servers unless specified.

Install MongoDB

1. Install EPEL repo:

$ rpm -Uhv http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

2. Install MongoDB and all required components:

$ yum install mongodb* -y --enablerepo=epel

 

Config Servers

1. Create config database directory. By default, MongoDB will use /data/configdb:

$ mkdir -p /data/configdb

2. Default port for config server is 27019. Start config servers:

$ mongod --configsvr --fork --logpath /var/log/mongodb.log --logappend

You should see following output:

forked process: 5464
all output going to: /var/log/mongodb.log
child process started successfully, parent exiting

 

Routing Servers

1. By default, mongos will listen on port 27017. Start mongos as below:

$ mongos --configdb mongo1,mongo2,mongo3 --fork --logpath /var/log/mongodb.log --logappend

You should see following output:

forked process: 5534
all output going to: /var/log/mongodb.log
child process started successfully, parent exiting

Shard Servers

1. Create default data directory. By default, MongoDB will use /data/db:

$ mkdir -p /data/db

2. By default, mongod with –shardsvr option will listen on port 27018. Start mongod as below:

$ mongod --shardsvr --fork --logpath /var/log/mongodb.log --logappend

You should see following output:

forked process: 5675
all output going to: /var/log/mongodb.log
child process started successfully, parent exiting

 

MongoDB Sharding

1. Verify that MongoDB services are listening to correct ports:

$ netstat -tulpn | grep mongo
 
tcp     0    0      0.0.0.0:27017     0.0.0.0:*     LISTEN    5534/mongos
tcp     0    0      0.0.0.0:27018     0.0.0.0:*     LISTEN    5675/mongod
tcp     0    0      0.0.0.0:27019     0.0.0.0:*     LISTEN    5464/mongod
tcp     0    0      0.0.0.0:28017     0.0.0.0:*     LISTEN    5534/mongos
tcp     0    0      0.0.0.0:28018     0.0.0.0:*     LISTEN    5675/mongod
tcp     0    0      0.0.0.0:28019     0.0.0.0:*     LISTEN    5464/mongod

2. SSH into mongo1 and type mongo to access the mongos console:

$ mongo

3. Use admin database to list the sharding status:

mongos> use admin
mongos> db.runCommand( { listshards : 1 } );

You should get this reply:

{ "shards" : [ ], "ok" : 1 }

4. Add the sharded servers by specifying the hostname and MongoDB shard service port:

mongos> sh.addShard( "mongo1:27018");
{ "shardAdded" : "shard0000", "ok" : 1 }
mongos> sh.addShard( "mongo2:27018");
{ "shardAdded" : "shard0001", "ok" : 1 }
mongos> sh.addShard( "mongo3:27018");
{ "shardAdded" : "shard0002", "ok" : 1 }

5. Download this JSON example file and import into database mydb:

$ wget http://media.mongodb.org/zips.json
$ mongoimport --db mydb --collection zip --file zips.json
connected to: 127.0.0.1
Mon Mar 25 06:22:35 imported 29470 objects

6. Enable sharding for mydb:

mongos> sh.enableSharding ("mydb");
{ "ok" : 1 }

7. Check sharding status:

mongos> sh.status()
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "shard0000", "host" : "mongo1:27018" }
{ "_id" : "shard0001", "host" : "mongo2:27018" }
{ "_id" : "shard0002", "host" : "mongo3:27018" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "mydb", "partitioned" : true, "primary" : "shard0000" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0001" }

You can see database mydb has been partitioned by MongoDB with value true.

CentOS: Install MongoDB – The Simple Way

I am in phase of learning a NoSQL database called MongoDB. I will be using a CentOS 6.3 64bit box with minimal ISO installation disc with several package installed like perl, vim, wget, screen, sudo and cronie using yum.

We will use EPEL repo, which includes MongoDB installation package to simplify the deployment.

1. Install EPEL repo for CentOS 6. You can get the link from here, http://dl.fedoraproject.org/pub/epel/6/x86_64/:

$ rpm -Uhv http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

2. Install MongoDB using yum:

$ yum install mongodb* -y

3. Configure mongod to start on boot and start the service:

$ chkconfig mongod on
$ service mongod start

4. MongoDB will be using ports 27017-27019 and 28017. We will add it into the iptables rules:

$ iptables -A INPUT -m tcp -p tcp --dport 27017:27019 -j ACCEPT
$ iptables -A INPUT -m tcp -p tcp --dport 28017 -j ACCEPT

5. Check whether MongoDB is listening to the correct port:

$ netstat -tulpn | grep mongod
tcp        0      0 127.0.0.1:27017             0.0.0.0:*                   LISTEN      26575/mongod

6. Login into MongoDB console by using this command:

$ mongo

7. In the console, you can use help command to see the list of supported command as below:

> help
db.help()         help on db methods
db.mycoll.help()  help on collection methods
sh.help()         sharding helpers
rs.help()         replica set helpers
help admin        administrative help
help connect      connecting to a db help
help keys         key shortcuts
help misc         misc things to know
help mr           mapreduce
 
show dbs                    show database names
show collections            show collections in current database
show users                  show users in current database
show profile                show most recent system.profile entries with time >= 1ms
show logs                   show the accessible logger names
show log [name]             prints out the last segment of log in memory, 'global' is default
use <db_name>               set current database
db.foo.find()               list objects in collection foo
db.foo.find( { a : 1 } )    list objects in foo where a == 1
it                          result of the last line evaluated; use to further iterate
DBQuery.shellBatchSize = x  set default number of items to display on shell
exit                        quit the mongo shell

So now I have required stuffs installed for MongoDB. Lets learn MongoDB by starting at this page: http://docs.mongodb.org/manual/tutorial/getting-started/#create-a-collection-and-insert-documents