Tutorial

How To Install and Configure Elasticsearch on CentOS 7

Updated on October 20, 2020
How To Install and Configure Elasticsearch on CentOS 7
Not using CentOS 7?Choose a different version or distribution.
CentOS 7

The author selected the COVID-19 Relief Fund to receive a donation as part of the Write for DOnations program.

Introduction

Elasticsearch is a platform for the distributed search and analysis of data in real time. Its popularity is due to its ease of use, powerful features, and scalability.

Elasticsearch supports RESTful operations. This means that you can use HTTP methods (GET, POST, PUT, DELETE, etc.) in combination with an HTTP URI (/collection/entry) to manipulate your data. The intuitive RESTful approach is both developer and user friendly, which is one of the reasons for Elasticsearch’s popularity.

Elasticsearch is free and open-source software with a solid company behind it — Elastic. This combination makes it suitable for many use cases, from personal testing to corporate integration.

This article will introduce you to Elasticsearch and show you how to install, configure, and start using it.

Prerequisites

To follow this tutorial you will need the following:

Step 1 — Installing Java on CentOS 7

Elasticsearch is written in the Java programming language. Your first task, then, is to install a Java Runtime Environment (JRE) on your server. You will use the native CentOS OpenJDK package for the JRE. This JRE is free, well-supported, and automatically managed through the CentOS Yum installation manager.

Install the latest version of OpenJDK 8:

  1. sudo yum install java-1.8.0-openjdk.x86_64

Now verify your installation:

  1. java -version

The command will create an output like this:

Output
openjdk version "1.8.0_262" OpenJDK Runtime Environment (build 1.8.0_262-b10) OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode)

When you advance in using Elasticsearch and you start looking for better Java performance and compatibility, you may opt to install Oracle’s proprietary Java (Oracle JDK 8). For more information, reference our article on How To Install Java on CentOS and Fedora.

Step 2 — Downloading and Installing Elasticsearch on CentOS 7

You can download Elasticsearch directly from elastic.co in zip, tar.gz, deb, or rpm packages. For CentOS, it’s best to use the native rpm package, which will install everything you need to run Elasticsearch.

At the time of this writing, the latest Elasticsearch version is 7.9.2.

From a working directory of your choosing, download the program:

  1. wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-x86_64.rpm

Then install it using the rpm command:

  1. sudo rpm -ivh elasticsearch-7.9.2-x86_64.rpm

Elasticsearch will install in /usr/share/elasticsearch/, with its configuration files placed in /etc/elasticsearch and its init script added in /etc/init.d/elasticsearch.

To make sure Elasticsearch starts and stops automatically with the server, add its init script to the default runlevels:

  1. sudo systemctl enable elasticsearch.service

With Elasticsearch installed, you will now configure a few important settings.

Step 3 — Configuring Elasticsearch on CentOS 7

Now that you have installed Elasticsearch and its Java dependency, it is time to configure Elasticsearch.

The Elasticsearch configuration files are in the /etc/elasticsearch directory. The ones we’ll review and edit are:

  • elasticsearch.yml — Configures the Elasticsearch server settings. This is where most options are stored, which is why we are mostly interested in this file.

  • jvm.options — Provides configuration for the JVM such as memory settings.

The first variables to customize on any Elasticsearch server are node.name and cluster.name in elasticsearch.yml. Let’s do that now.

As their names suggest, node.name specifies the name of the server (node) and the cluster to which the latter is associated. If you don’t customize these variables, a node.name will be assigned automatically in respect to the server hostname. The cluster.name will be automatically set to the name of the default cluster.

The cluster.name value is used by the auto-discovery feature of Elasticsearch to automatically discover and associate Elasticsearch nodes to a cluster. Thus, if you don’t change the default value, you might have unwanted nodes, found on the same network, in your cluster.

Let’s start editing the main elasticsearch.yml configuration file.

Open it using nano or your preferred text editor:

  1. sudo nano /etc/elasticsearch/elasticsearch.yml

Remove the # character at the beginning of the lines for node.name and cluster.name to uncomment them, and then change their values. Your first configuration changes in the /etc/elasticsearch/elasticsearch.yml file will look like this:

/etc/elasticsearch/elasticsearch.yml
...
node.name: "My First Node"
cluster.name: mycluster1
...

The networking settings are also found in elasticsearch.yml. By default, Elasticsearch will listen on localhost on port 9200 so that only clients from the same server can connect. You should leave these settings unchanged from a security point of view, because the open source and free edition of Elasticsearch doesn’t offer authentication features.

Another important setting is the node.roles property. You can set this to master-eligible (simply master in the configuration), data or ingest.

The master-eligible role is responsible for the cluster’s health and stability. In large deployments with a lot of cluster nodes, it’s recommended to have more than one dedicated node with a master role only. Typically, a dedicated master node will neither store data nor create indices. Thus, there will be no chance of being overloaded, by which the cluster health could be endangered.

The data role defines the nodes that will store the data. Even if a data node is overloaded, the cluster health shouldn’t be affected seriously, provided there are other nodes to take the additional load.

Lastly, the ingest role allows a node to accept and process data streams. In larger setups, there should be dedicated ingest nodes in order to avoid possible overload on the master and data nodes.

Note: one node may have one or more roles allowing scalability, redundancy and high-availability of the Elasticsearch setup. By default, all of these roles are assigned to the node. This is suitable for a single-node Elasticsearch, as in the example scenario described in this article. Therefore, you don’t have to change the role. Still, if you want to change the role, such as dedicating a node as a master, you can do it by changing /etc/elasticsearch/elasticsearch.yml like this:

/etc/elasticsearch/elasticsearch.yml
...
node.roles: [ master ]
...

Another setting to consider changing is path.data. This determines the path where data is stored, and the default path is /var/lib/elasticsearch. In a production environment it’s recommended that you use a dedicated partition and mount point for storing Elasticsearch data. In the best case, this dedicated partition will be a separate storage media that will provide better performance and data isolation. You can specify a different path.data path by uncommenting the path.data line and changing its value like this:

/etc/elasticsearch/elasticsearch.yml
...
path.data: /media/different_media
...

Now that you have made all your changes, save and close elasticsearch.yml.

You must also edit your configurations in jvm.options.

Recall that Elasticsearch is run by a JVM, i.e. essentially it’s a Java application. So just as any Java application it has JVM settings that can be configured in the file /etc/elasticsearch/jvm.options. Two of the most important settings, especially in regards to performance, are Xms and Xmx, which define the minimum (Xms) and maximum (Xmx) memory allocation.

By default, both are set to 1GB, but that is almost never optimal. Not only that, but if your server only has 1GB of RAM, you won’t be able to start Elasticsearch with the default settings. This is because the operating system takes at least 100MB so it will not be possible to dedicate 1GB to Elasticsearch.

Unfortunately, there is no universal formula for calculating the memory settings. Naturally, the more memory you allocate, the better your performance, but make sure that there is enough memory left for the rest of the processes on the server. For example, if your machine has 1GB of RAM, you could set both Xms and Xmx to 512MB, thus allowing another 512MB for the rest of the processes. Note that usually both Xms and Xmx are set to the same value in order to avoid the performance penalty of the JVM garbage collection.

If your server only has 1GB of RAM, you must edit this setting.

Open jvm.options:

  1. sudo nano /etc/elasticsearch/jvm.options

Now change the Xms and Xmx values to 512MB:

/etc/elasticsearch/jvm.options
...
-Xms512m
-Xmx512m
...

Save and exit the file.

Now start Elasticsearch for the first time:

  1. sudo service elasticsearch start

Allow at least 10 seconds for Elasticsearch to start before you attempt to use it. Otherwise, you may get a connection error.

Note: You should know that not all Elasticsearch settings are set and kept in configuration files. Instead, some settings are set via its API, like index.number_of_shards and index.number_of_replicas. The first determines into how many pieces (shards) the index will split. The second defines the number of replicas that will be distributed across the cluster. Having more shards improves the indexing performance, while having more replicas makes searching faster.

Assuming that you are still exploring and testing Elasticsearch on a single node, you can play with these settings and alter them by executing the following curl command:

  1. curl -XPUT -H 'Content-Type: application/json' 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{
  2. "index.number_of_replicas" : "0",
  3. "index.number_of_shards" : "1"
  4. }'

With Elasticsearch installed and configured, you will now secure and test the server.

Step 4 — (Optional) Securing Elasticsearch on CentOS 7

Elasticsearch has no built-in security and anyone who can access the HTTP API can control it. This section is not a comprehensive guide to securing Elasticsearch. Take whatever measures are necessary to prevent unauthorized access to it and the server/virtual machine on which it is running.

By default, Elasticsearch is configured to listen only on the localhost network interface, i.e. remote connections are not possible. You should leave this setting unchanged unless you have taken one or both of the following measures:

  • You have limited the access to TCP port 9200 only to trusted hosts with iptables.
  • You have created a vpn between your trusted hosts and you are going to expose Elasticsearch on one of the vpn’s virtual interfaces.

Only once you have done the above should you consider allowing Elasticseach to listen on other network interfaces besides localhost. Such a change might be considered when you need to connect to Elasticsearch from another host, for example.

To change the network exposure, open the file elasticsearch.yml:

  1. sudo nano /etc/elasticsearch/elasticsearch.yml

In this file find the line that contains network.host, uncomment it by removing the # character at the beginning of the line, and then change the value to the IP address of the secured network interface. The line will look something like this:

/etc/elasticsearch/elasticsearch.yml
...
network.host: 10.0.0.1
...

Warning: Because Elasticsearch doesn’t have any built-in security, it is very important that you do not set this to any IP address that is accessible to any servers that you do not control or trust. Do not bind Elasticsearch to a public or shared private network IP address.

Also, for additional security you can disable scripts that are used to evaluate custom expressions. By crafting a custom malicious expression, an attacker might be able to compromise your environment.

To disable custom expressions, add the following line at the end of the /etc/elasticsearch/elasticsearch.yml file:

...
[label /etc/elasticsearch/elasticsearch.yml]
script.allowed_types: none
...

For the above changes to take effect, you will have to restart Elasticsearch.

Restart Elasticsearch now:

  1. sudo service elasticsearch restart

In this step you took some measures to secure your Elasticsearch server. Now you are ready to test the application.

Step 5 — Testing Elasticsearch on CentOS 7

By now, Elasticsearch should be running on port 9200. You can test this using curl, the command-line tool for client-side URL transfers.

To test the service, make a GET request like this:

  1. curl -X GET 'http://localhost:9200'

You will see the following response:

Output
{ "name" : "My First Node", "cluster_name" : "mycluster1", "cluster_uuid" : "R23U2F87Q_CdkEI2zGhLGw", "version" : { "number" : "7.9.2", "build_flavor" : "default", "build_type" : "rpm", "build_hash" : "d34da0ea4a966c4e49417f2da2f244e3e97b4e6e", "build_date" : "2020-09-23T00:45:33.626720Z", "build_snapshot" : false, "lucene_version" : "8.6.2", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }

If you see a similar response, Elasticsearch is working properly. If not, recheck the installation instructions and allow some time for Elasticsearch to fully start.

Your Elasticsearch server is now operational. In the next step you will add and retrieve some data from the application.

Step 6 — Using Elasticsearch on CentOS 7

In this step you will add some data to Elasticsearch and then make some manual queries.

Use curl to add your first entry:

  1. curl -H 'Content-Type: application/json' -X POST 'http://localhost:9200/tutorial/helloworld/1' -d '{ "message": "Hello World!" }'

You will see the following output:

Output
{"_index":"tutorial","_type":"helloworld","_id":"1","_version":3,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":2,"_primary_term":4

Using curl, you sent an HTTP POST request to the Elasticseach server. The URI of the request was /tutorial/helloworld/1. Let’s take a closer look at those parameters:

  • tutorial is the index of the data in Elasticsearch.
  • helloworld is the type.
  • 1 is the id of our entry under the above index and type.

Note that it’s also required to set the content type of all POST requests to JSON with the argument -H 'Content-Type: application/json'. If you do not do this Elasticsearch will reject your request.

Now retrieve your first entry using an HTTP GET request:

  1. curl -X GET 'http://localhost:9200/tutorial/helloworld/1'

The result will look like this:

Output
{"_index":"tutorial","_type":"helloworld","_id":"1","_version":3,"_seq_no":2,"_primary_term":4,"found":true,"_source":{ "message": "Hello World!" }}

To modify an existing entry you can use an HTTP PUT request like this:

  1. curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/tutorial/helloworld/1?pretty' -d '
  2. {
  3. "message": "Hello People!"
  4. }'

Elasticsearch will acknowledge successful modification like this:

Output
{ "_index" : "tutorial", "_type" : "helloworld", "_id" : "1", "_version" : 2, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1 }

In the above example you modified the message of the first entry to "Hello People!". With that, the version number increased to 2.

To make the output of your GET operations more human-readable, you can also “prettify” your results by adding the pretty argument:

  1. curl -X GET 'http://localhost:9200/tutorial/helloworld/1?pretty'

Now the response will output in a more readable format:

Output
{ "_index" : "tutorial", "_type" : "helloworld", "_id" : "1", "_version" : 2, "_seq_no" : 1, "_primary_term" : 1, "found" : true, "_source" : { "message" : "Hello People!" } }

This is how you can add and query data in Elasticsearch. To learn about the other operations you can check the Elasticsearch API documentation.

Conclusion

In this tutorial you installed, configured, and began using Elasticsearch on CentOS 7. Once you are comfortable with manual queries, your next task will be to start using the service from your applications.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the authors
Default avatar
Toli

author



Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
5 Comments


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Later I found the elasticsearch this guide used is outdate, but I’ve installed. spent time to uninstall, reinstall and met problem, here is my steps which works for me if you also had installed the outdate version elasticsearch and want to reinstall the new version:

yum remove elasticsearch
rm -rf /var/lib/elasticsearch/elasticsearch/nodes
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.4.1.rpm
rpm -ivh elasticsearch-5.4.1.rpm
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
sudo systemctl start elasticsearch.service
curl -XGET 'localhost:9200'

Thanks for the explanation. How do I add a node to the existing cluster?

why i m getting bash command not found error

There seems to be two tutorials, with differing steps, on installing ElasticSearch. This tutorial, and this one: https://www.digitalocean.com/community/tutorials/how-to-set-up-a-production-elasticsearch-cluster-on-centos-7#configure-elasticsearch-cluster

This tutorial sets up a single node cluster, whereas the other tutorial sets up a multi-node cluster. However, if I try and cluster using a node set up with this tutorial, I encounter difficulties. This one seems to be missing a step on “Creat[ing] a new yum repository file for Elasticsearch.”

Can you reconcile the two tutorials, please?

Unless I’m missing something tutorial has you setup a VPN then it has you limit the access to the local machine only, so you cannot use it over the VPN… How do you use it over the VPN?

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Featured on Community

Get our biweekly newsletter

Sign up for Infrastructure as a Newsletter.

Hollie's Hub for Good

Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

Become a contributor

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

Welcome to the developer cloud

DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

Learn more
Animation showing a Droplet being created in the DigitalOcean Cloud console