We hope you find this tutorial helpful. In addition to guides like this one, we provide simple cloud infrastructure for developers. Learn more →

How To Configure a Multi-Node Cluster with Cassandra on a Ubuntu VPS

PostedSeptember 11, 2013 61.6k views NoSQL Clustering Scaling Ubuntu

Introduction

This tutorial will teach you how to configure a Multi-Node cluster with Cassandra on a VPS. Cassandra, a highly scalable open source database system that achieves great performance when setup with multiple-nodes – even on different data centers.

Installing Cassandra on Each Node

Before we begin configuring each node, you need to have Cassandra installed in every one of them. We have an easy tutorial on how to do that with VPS. After you've installed Cassandra on every node, you need to make sure it isn't running. To close Cassandra, type in:

sudo ps auwx | grep cassandra

If a process different from the "grep" one appears, copy the proccess ID and kill it:

sudo kill -9 PID
The highlited number is the PID How to kill the proccess

You'll also need to clear data. Do so by running:

sudo rm -rf /var/lib/cassandra/*

Configuring Cassandra

To configure Cassandra for multiple nodes, you'll need to know beforehand how many nodes you're going to use, and calculate token numbers for each. We've developed a tool to do this, and you can get it here. Simply write the number of nodes you're dealing with and you'll have tokens for each node. For example, if you have three nodes, you'd have these numbers:

Node 0: 0
Node 1: 3074457345618258602
Node 2: 6148914691236517205

Now you'll need to edit your configuration file for each node. To do so, open the nano text editor by running:

nano ~/cassandra/conf/cassandra.yaml

The information you'll need to edit can be the same for all nodes (cluster_name, seed_provider, rpc_address and endpoint_snitch) or different for each one (initial_token and listen_address). Choose a node to be your seed one, and look in the configuration file for the lines that refer to each of these attributes, and modify them to your needs:

cluster_name: 'Name'
initial_token: Token
seed_provider:
    - seeds:  "Seed IP"
listen_address: Droplet's IP
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch
Substitute "Name" by your cluster name, "Token" by the number you generated earlier (depending on the node), "Seed IP" by your seed node's IP, and "Droplet's IP" by your droplet's IP address. Do this for each node. Example of this filled on a 3-node setup:
Node 0
cluster_name: 'MyDigitalOceanCluster'
initial_token: 0
seed_provider:
    - seeds:  "198.211.xxx.0"
listen_address: 198.211.xxx.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

Node 1
cluster_name: 'MyDigitalOceanCluster'
initial_token: 3074457345618258602
seed_provider:
    - seeds:  "198.211.xxx.0"
listen_address: 192.241.xxx.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

Node 2
cluster_name: 'MyDigitalOceanCluster'
initial_token: 6148914691236517205
seed_provider:
    - seeds:  "198.211.xxx.0"
listen_address: 37.139.xxx.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

To run, simply type in:

sudo sh ~/cassandra/bin/cassandra

on the seed node and when it's finished, replicate this process on the other nodes. If you don't see any errors, your multi-node Cassandra setup should be successfully deployed.

3 Comments

Creative Commons License