Apache Cassandra is a highly scalable open source database system, achieving great performance on multi-node setups.
Previously, we went over how to run a single-node Cassandra cluster. In this tutorial, you’ll learn how to install and use Cassandra to run a multi-node cluster on Ubuntu 14.04.
Because you’re about to build a multi-node Cassandra cluster, you must determine how many servers you’d like to have in your cluster and configure each of them. It is recommended, but not required, that they have the same or similar specifications.
To complete this tutorial, you’ll need the following:
At least two Ubuntu 14.04 servers configured using this initial setup guide.
Each server must be secured with a firewall using this IPTables guide.
Each server must also have Cassandra installed by following this Cassandra installation guide.
Servers in a Cassandra cluster are known as nodes. What you have on each server right now is a single-node Cassandra cluster. In this step, we’ll set up the nodes to function as a multi-node Cassandra cluster.
All the commands in this and subsequent steps must be repeated on each node in the cluster, so be sure to have as many terminals open as you have nodes in the cluster.
The first command you’ll run on each node will stop the Cassandra daemon.
- sudo service cassandra stop
When that’s completed, delete the default dataset.
- sudo rm -rf /var/lib/cassandra/data/system/*
Cassandra’s configuration file is located in the /etc/cassandra
directory. That configuration file, cassandra.yaml
, contains many directives and is very well commented. In this step, we’ll modify that file to set up the cluster.
Only the following directives need to be modified to set up a multi-node Cassandra cluster:
cluster_name
: This is the name of your cluster.
-seeds
: This is a comma-delimited list of the IP address of each node in the cluster.
listen_address
: This is IP address that other nodes in the cluster will use to connect to this one. It defaults to localhost and needs changed to the IP address of the node.
rpc_address
: This is the IP address for remote procedure calls. It defaults to localhost. If the server’s hostname is properly configured, leave this as is. Otherwise, change to server’s IP address or the loopback address (127.0.0.1
).
endpoint_snitch
: Name of the snitch, which is what tells Cassandra about what its network looks like. This defaults to SimpleSnitch, which is used for networks in one datacenter. In our case, we’ll change it to GossipingPropertyFileSnitch, which is preferred for production setups.
auto_bootstrap
: This directive is not in the configuration file, so it has to be added and set to false. This makes new nodes automatically use the right data. It is optional if you’re adding nodes to an existing cluster, but required when you’re initializing a fresh cluster, that is, one with no data.
Open the configuration file for editing using nano
or your favorite text editor.
- sudo nano /etc/cassandra/cassandra.yaml
Search the file for the following directives and modify them as below to match your cluster. Replace your_server_ip
with the IP address of the server you’re currently working on. The - seeds:
list should be the same on every server, and will contain each server’s IP address separated by commas.
. . .
cluster_name: 'CassandraDOCluster'
. . .
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "your_server_ip,your_server_ip_2,...your_server_ip_n"
. . .
listen_address: your_server_ip
. . .
rpc_address: your_server_ip
. . .
endpoint_snitch: GossipingPropertyFileSnitch
. . .
At the bottom of the file, add in the auto_bootstrap
directive by pasting in this line:
auto_bootstrap: false
When you’re finished modifying the file, save and close it. Repeat this step for all the servers you want to include in the cluster.
At this point, the cluster has been configured, but the nodes are not communicating. In this step, we’ll configure the firewall to allow Cassandra traffic.
First, restart the Cassandra daemon on each.
- sudo service cassandra start
If you check the status of the cluster, you’ll find that only the local node is listed, because it’s not yet able to communicate with the other nodes.
- sudo nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.1.4 147.48 KB 256 ? f50799ee-8589-4eb8-a0c8-241cd254e424 rack1
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
To allow communication, we’ll need to open the following network ports for each node:
7000
, which is the TCP port for commands and data.
9042
, which is the TCP port for the native transport server. cqlsh
, the Cassandra command line utility, will connect to the cluster through this port.
To modify the firewall rules, open the rules file for IPv4.
- sudo nano /etc/iptables/rules.v4
Copy and paste the following line within the INPUT chain, which will allow traffic on the aforementioned ports. If you’re using the rules.v4
file from the firewall tutorial, you can insert the following line just before the # Reject anything that's fallen through to this point
comment.
The IP address specified by-s
should be the IP address of another node in the cluster. If you have two nodes with IP addresses 111.111.111.111
and 222.222.222.222
, the rule on the 111.111.111.111
machine should use the IP address 222.222.222.222
.
-A INPUT -p tcp -s your_other_server_ip -m multiport --dports 7000,9042 -m state --state NEW,ESTABLISHED -j ACCEPT
After adding the rule, save and close the file, then restart IPTables.
- sudo service iptables-persistent restart
We’ve now completed all the steps needed to make the nodes into a multi-node cluster. You can verify that they’re all communicating by checking their status.
- sudo nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.1.4 147.48 KB 256 ? f50799ee-8589-4eb8-a0c8-241cd254e424 rack1
UN 192.168.1.6 139.04 KB 256 ? 54b16af1-ad0a-4288-b34e-cacab39caeec rack1
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
If you can see all the nodes you configured, you’ve just successfully set up a multi-node Cassandra cluster.
You can also check if you can connect to the cluster using cqlsh
, the Cassandra command line client. Note that you can specify the IP address of any node in the cluster for this command.
- cqlsh your_server_ip 9042
You will see it connect:
Connected to My DO Cluster at 192.168.1.6:9042.
[cqlsh 5.0.1 | Cassandra 2.2.3 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh>
Then you can exit the CQL terminal.
- exit
Congratulations! You now have a multi-node Cassandra cluster running on Ubuntu 14.04. More information about Cassandra is available at the project’s website. If you need to troubleshoot the cluster, the first place to look for clues are in the log files, which are located in the /var/log/cassandra
directory.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Sign up for Infrastructure as a Newsletter.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Hi, When I try to run: sudo nodetool status I got this line: nodetool: Failed to connect to ‘127.0.0.1:7199’ - ConnectException: ‘Connection refused’.
What to do?
Hi guy’s I want to configure firewall in windows, do you have a resource for me???
Hi, I have some doubts about configuring cluster…
If I have 8 nodes cluster. On what basis I should decide keeping 4 in rack1 other in rack2 in the same data center.
If I have a cluster of some nodes with only one data center. After few year I need to add one more data center in same cluster. Will I need to reconfigure all the cluster? Is there any way to add it then?
About auto_bootstrap setting Do I need to set it true, only if I am adding new fresh node to cluster?
Hello Team,
How to make cassandra accessible remotely (Installed in AWS Ec2 Ubuntu 16.04 LTS)? I’m using cassandra 3.10 CQL 5.10.
I tried installing cassandra in my local machine it works unerringly.
When it comes to remote machine ie aws - Ec2 Ubuntu 16.04 LTS works locally but i wanted to make the cassandra access to my applications by changing few parameters in .yaml file and .sh files.
Please look into the below parameters and procedure which i followed/changed are right or wrong.
listen_address : 54.32.XX.XX (Public IP address) seed : 54.32.XX.XX (Public IP address) rpc_start : true rpc_address : 54.32.XX.XX (Public IP address) endpoint_snitch : Ec2Snitch
Djava.rmi… : <54.32.XX.XX>(Public IP address)
Save and exit.
restart the cassandra service using the terminal commands i.e sudo service cassandra stop and sudo service cassandra start
Run the sudo nodetool status command
nodetool status results below message
Please let me know how to resolve this and where m going wrong.
Thanks in advance
Regards Anith
Hi m trying to run cassandra cluster with nodejs and I have this table with just 2k rows, with loadtest I found out that with concurrent 110-120 users it starts crashing nodejs so I tried adding connection pooling and stuff which increased users to 200 but I want it to be at least 10k concurrent users I must be missing something if you could please help me out, would really really appreciate it