We hope you find this tutorial helpful. In addition to guides like this one, we provide simple cloud infrastructure for developers. Learn more →

How To Use the Apache Cassandra One-Click Application Image

PostedMarch 17, 2015 23.8k views NoSQL Applications One-Click Install Apps Ubuntu

Status: Archived

The Cassandra One-Click Image was retired.

See Instead: See one of these Cassandra-related articles, including how to run a single or multi-node Cassandra cluster on Ubuntu 14.04.

NOTE: Due to Cassandra's memory requirements, the Cassandra One-Click Application image can only be used on droplets with 1GB or more of RAM.

Introduction

Apache Cassandra is an open source distributed noSQL database system which can handle massive data sets across many nodes. This tutorial will guide you in using the DigitalOcean Cassandra One-Click Application image to create a single or multi-node cluster as well as methods to automate scaling your Cassandra cluster using user-data.

Creating a Cassandra Droplet

To create your first Cassandra droplet navigate to the Create Droplet page in the control panel, select a size, name and region for your droplet and then choose the Cassandra on 14.04 image from the Applications tab before clicking Create Droplet

Once your droplet has been created you will have a single-node Cassandra cluster ready to use locally on your droplet.

Configuring Cassandra

This local, single-node cluster has some limitations. When it is first launched the Cassandra service will only be listening on localhost meaning that the service will not be accessible by clients outside your Cassandra droplet. Additionally there is no Authentication service enabled by default which means that the service will not prompt for a username and password. The first thing we will do is to adjust some of these configuration settings to something more ideal.

First, stop the Cassandra service.

service cassandra stop

Then we will clear any data that the Cassandra service generated when it first launched so we can start with a clean setup.

rm -rf /var/lib/cassandra/*;

Now we're ready to start modifying the Cassandra configuration file. Open the file /etc/cassandra/cassandra.yaml using the editor of your choice.

First we will give our Cluster a name. Find the line

cluster_name: 'Test Cluster'

in cassandra.yaml and change Test Cluster to a name of your choice. Note: The name you select here must be included in the configuration for each node in your cluster.

Next we will allow cassandra to listen on the public network. To do this, locate the line:

listen_address: localhost

and change localhost to your droplet's IP address.

listen_address: 12.34.56.78

We can't have our database listening for requests on the public interface without ensuring we have some security set up so next we will enable password authentication. To do this, locate the line:

authenticator: AllowAllAuthenticator

and change it to:

authenticator: PasswordAuthenticator

Finally we need to specify a seed IP address. Since this is the only node in our cluster we will use our droplet's public IP address again here. Find the line:

seeds: "127.0.0.1"

and change it to your droplet's IP address.

seeds: "12.34.56.78"

Now that we've completed our changes to the Cassandra configuration you can save your changes and exit your editor.

We can now start the cassandra service back up with the following command:

service cassandra start

After we allow a couple minutes for the service to complete it's start-up routine we can connect to our cassandra service using cqlsh, the CQL shell. Since we have enabled password authentication but have not yet created a new user account we will use the default user cassandra with the password cassandra.

cqlsh -u cassandra -p cassandra

You should see something like the following displayed:

Connected to testCluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.1.3 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cassandra@cqlsh>

Obviously our current username and password are not very secure so we will create a new user account to administer our cluster and remove permissions from the default cassandra user. First create the new user as a SUPERUSER:

CREATE USER newadminuser WITH PASSWORD 'mypassword' SUPERUSER;

Next we will change the cassandra user's password to something hard to guess and remove it's super-user status:

ALTER USER cassandra WITH PASSWORD '89asd9f87as9f879sf' NOSUPERUSER;

Now we have our single-node cluster up and running and we have created a user account to allow us to manage it. Next lets add some data to our cluster.

We will start by creating a keyspace. If you are familiar with other database platforms a keyspace in Cassandra serves much the same role as a database in MySQL. Each keyspace can include many tables of data. Options can be passed when creating a new keyspace, for this example we will use a very basic set of options to create a keyspace called Test:

CREATE KEYSPACE Test WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

Next we will add a very basic table called users to our new keyspace.

CREATE TABLE Test.users (user_name varchar PRIMARY KEY,password varchar,info varchar);

This will create a table with 3 varchar columns that can accept text with user_name set as the PRIMARY KEY.

Next, lets add a record to this new table.

INSERT INTO Test.users (user_name,password,info) VALUES ('JohnDoe','1234','user information goes here');

Now we can query this information with a CQL query:

SELECT * from Test.users;

And we should see our record returned:

user_name | info                       | password
-----------+----------------------------+----------
JohnDoe | user information goes here |     1234

(1 rows)

Multi-Node Clusters

Now that we have Cassanrda running as a single node cluster lets add some more nodes. As with our first node we will start by creating a new droplet using the Cassandra One-Click image.

Once our new node is created we can connect to it via ssh and perform our initial setup.

First, stop the Cassandra service:

service cassandra stop;

and clear out any data created so far on this new node:

rm -rf /var/lib/cassandra/*;

Now we are ready to begin editing our configuration. Open /etc/cassandra/cassandra.yaml in the editor of your choice.

Locate the cluster_name line and set it to the same value you used for your first node.

cluster_name: 'myCluster'

Next we will ensure this new node is listening on the public network interface by changing the listen_address to our droplet's IP address.

listen_address: 12.34.56.90

Now we will update the seed IP. We will set this to our first node's IP address so our new node can sync with it and join the cluster.

seeds: "12.34.56.78"

Now save and close the configuration file.

Now that we have configured our new droplet to join our cluster we can start the Cassandra service.

service cassandra start

It will take a couple minutes for this new node to come online and join our cluster. After 5 minutes or so we can try using our new node.

cqlsh -u newadminuser -p mypassword

As with our first node we should now see a successful connection reported:

Connected to testCluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.1.3 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cassandra@cqlsh>

If you see an error messsage saying that cqlsh was not able to connect you may need to allow a bit more time for the new node to come online.

Now that we're connected to our new node, lets test it out by running a query on the data we added on our first node.

SELECT * from Test.users;

We should see our record returned just as it was with our first node.

user_name | info                       | password
-----------+----------------------------+----------
JohnDoe | user information goes here |     1234

(1 rows)

We now have a functional multi-node Cassandra cluster.

Using User-Data to Deploy Nodes

It would be quite time consuming to perform each of these steps for every droplet we want to add to our cluster. Luckily with user-data we can automate this process. By passing the important variables to our droplet when it is created we can have it join our cluster immediately. For this example we will check the user-data checkbox on the droplet creation page and pate in the following script (modifying the values in red to those of our cluster and first node).

#!/bin/bash
export CLUSTER_NAME='myCluster';
export SEED_ADDRESS='12.34.56.78';
export IP_ADDRESS=$(curl -s http://169.254.169.254/metadata/v1/interfaces/public/0/ipv4/address);
service cassandra stop;
rm -rf /var/lib/cassandra/*;
sed -i.bak "s/cluster\_name\:\ 'Test Cluster'/cluster\_name\:\ '${CLUSTER_NAME}'/g" /etc/cassandra/cassandra.yaml
sed -i.bak s/authenticator\:\ AllowAllAuthenticator/authenticator\:\ PasswordAuthenticator/g /etc/cassandra/cassandra.yaml;
sed -i.bak s/listen\_address\:\ localhost/listen_address\:\ ${IP_ADDRESS}/g /etc/cassandra/cassandra.yaml;
sed -i.bak s/\-\ seeds\:\ \"127.0.0.1\"/\-\ seeds\:\ \"${SEED_ADDRESS}\"/g /etc/cassandra/cassandra.yaml;
service cassandra start;

Let's break down what this user-data script does. Most of it should be familiar.

First we have the two variables we will need to set for our new droplet, cluster name and seed ip address (the IP of our first node).

export CLUSTER_NAME='myCluster';
export SEED_ADDRESS='12.34.56.78';

Then we can use droplet meta-data to get the IP address of our newly created droplet and assign it to a variable.

export IP_ADDRESS=$(curl -s http://169.254.169.254/metadata/v1/interfaces/public/0/ipv4/address);

Next we will stop the Cassandra service and clear our any existing data.

service cassandra stop;
rm -rf /var/lib/cassandra/*;

We then use the sed command to find and replace the coniguration values we need to change.

sed -i.bak "s/cluster\_name\:\ 'Test Cluster'/cluster\_name\:\ '${cluster_name}'/g" /etc/cassandra/cassandra.yaml
sed -i.bak s/authenticator\:\ AllowAllAuthenticator/authenticator\:\ PasswordAuthenticator/g /etc/cassandra/cassandra.yaml;
sed -i.bak s/listen\_address\:\ localhost/listen_address\:\ ${IP_ADDRESS}/g /etc/cassandra/cassandra.yaml;
sed -i.bak s/\-\ seeds\:\ \"127.0.0.1\"/\-\ seeds\:\ \"${SEED_ADDRESS}\"/g /etc/cassandra/cassandra.yaml;

Finally we start the Cassandra service with our new configuration.

service cassandra start;

As with the manual setup of an additional node it may take several minutes for the cql service to be available on our new droplet but once it is up and running this new node should allow us to query our test keyspace and table just as the one we manually configured did.

Next Steps

We can take this tutorial one step further and automate the entire process. We have created a Ruby script, do-ccc based on the steps in this tutorial which utilizes the DigitalOcean API along with user-data and droplet meta-data to deploy a complete Cassandra cluster automatically. The script will prompt you for a cluster name, a region where you want to deploy your droplets, the number of nodes to create and the size of each node and will then create your cluster for you.

This guide provides steps to create a very basic Cassandra cluster. There are many ways that your configuration can be adjusted and optimized and it is strongly recommended to review the Apache Cassandra Documentation and other sources for more information on how to tune your cluster to best meet your needs.

18 Comments

Creative Commons License