Two possible way to import csv into cassandra (ref.):

  • COPY command: suitable for small size data.
  • sstableloader (Cassandra bulk loader): for LARGE DATA to import, should prepare sstable-format files with CQLSSTableWriter before import.

1. Load small data with CQL:COPY

# Source csv file content, notice:
# 1. the last column can be NULL if the line ends with ",NULL"
# 2. the timestamp columns specified timezone info as +0000 (UTC)
$ head trans_1k_use_copy.csv
18991230000236,2013-12-27 00:00:00 +0000,,,,,5611.67,2013-12-18 00:00:00 +0000,2014-01-03 00:00:00 +0000,NULL
18991230000236,2014-02-20 00:00:00 +0000,,,,,1516.2000000000003,2014-02-19 00:00:00 +0000,2014-03-19 00:00:00 +0000,1

cqlsh> COPY ks1.trans (uid, date, c1, c2, c3, sid, payment, ldate, ndate, diff) 
FROM '/Users/larrysu/repos/data/trans_1k_use_copy.csv'
WITH NULL = 'NULL';

table schema:

CREATE KEYSPACE ks1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
CREATE TABLE ks1.src_trans (
  uid varchar,
  date timestamp,
  c1 varchar,
  c2 varchar,
  c3 varchar,
  sid varchar,
  payment double,
  ldate timestamp,
  ndate timestamp,
  diff int,
  PRIMARY KEY ((date), uid, c1, c2, c3, sid)
);

2. Load large data with sstabeloader

Here we use scala and SBT to make sstable files with java class:CQLSSTableWriter

Source CSV source file content:

$ head sample.csv 
Q8AM041664,871,2014-03-03
Q8AM052362,111,2014-03-21
L8AM010326,411,2013-06-16
L8AM125779,271,2015-04-05
QGS0179371,411,2013-02-07
Q8AM002160,379.4,2013-06-12
...more than 1 million rows

Using the example program here to prepare the sstable files.

sbt "\
run -k myks -t sampledata -p c1,c2 \
-c c1=varchar,c2=double,c3=date \
-f /Users/larrysu/repos/data/test/sample.csv \
-o /Users/larrysu/repos/data/test/sstable/myks/sampledata"

Make SSTable

Then create the KEYSPACE & TABLE manually:

CREATE KEYSPACE myks WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

CREATE TABLE myks.sampledata ( c1 varchar, c2 double, c3 date, 
  PRIMARY KEY (c1,c2)
);
sstableloader -d 127.0.0.1 /Users/larrysu/repos/data/test/sstable/myks/sampledata

Load SSTable