Big Data

NASA gathered approximately 1.73 gigabytes of data before you are able to reach the end of this sentence. Boeing 787 generates on average approximately half a terabyte of data per flight. Due to these vast amount of data generated by devices and sensors, most data management solutions are just not cut out for this task. Most relational databases like Oracle are great for dealing with data from a single application but cannot provide the scale nor the availability that a database like Cassandra can.

What is Cassandra?

Cassandra is an open source distributed database that was initially developed by Facebook and built on Amazon's Dynamo and Google’s Big Table.

Cassandra is designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is offering capabilities that relational databases and other NoSQL databases simply cannot match, such as: continuous availability, linear scale performance, operational simplicity and easy data distribution across multiple data centers or cloud availability zones.

Some companies which moved to Cassandra: Instagram, Spotify, Ebay, Comcast and many more.

Cassandra Features that can help your projects succeed

Decentralised

Every node in the cluster has the same role, so in other words there is no single point of failure.

Replication

Configurable replication, even low latency multi-datacenter setup is possible

Scalability

Read and write throughput increase linearly with the number of machines added, with no downtime and no interruptions to the applications

Fault tolerant

Data will be replicated in a fault tolerant way to multiple nodes. Failed nodes are replaceable with no downtime

Tunable consistency

Consistency levels in Cassandra can be configured to manage availability versus data accuracy. You can configure consistency on a cluster, data center, or individual I/O operation basis.

MapReduce support

Cassandra has an outstanding Hadoop integration, even Pig/Hive/Spark are supported. This can be very useful when you want to give near-realtime insights of your data.

How do I query the data in Cassandra?

Cassandra supplies the Cassandra Query Language (CQL), which is very SQL-like. Queries are done via the standard SELECT command, while data manipulation operations are accomplished via the familiar INSERT, UPDATE, DELETE, and TRUNCATE commands. Data definition commands such as CREATE are used to create new keyspaces and column families.

Example:

CREATE KEYSPACE In4itSpace
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
USE In4itSpace;
CREATE COLUMNFAMILY MyColumns (id text, Last text, First text, PRIMARY KEY(id));
INSERT INTO MyColumns (id, Last, First) VALUES ('1', 'DevOps', 'in4it');

SELECT * FROM MyColumns;
id | first | last
----+--------+------
1 | DevOps | in4it

(1 rows)

Follow me on LinkedIn to read my next article about Cassandra and how you could be using it in your organization or contact in4it to assist you with your Cassandra project.

About the Author

Jorn is a DevOps enthusiast and has been applying the principals of DevOps in every single team he has worked with