How To Interact with Data in ElasticSearch Using CRUD Operations

Published on December 27, 2013

How To Interact with Data in ElasticSearch Using CRUD Operations

Status: Deprecated

This article covers a version of Ubuntu that is no longer supported. If you are currently operate a server running Ubuntu 12.04, we highly recommend upgrading or migrating to a supported version of Ubuntu:

Reason: Ubuntu 12.04 reached end of life (EOL) on April 28, 2017 and no longer receives security patches or updates. This guide is no longer maintained.

See Instead: This guide might still be useful as a reference, but may not work on other Ubuntu releases. If available, we strongly recommend using a guide written for the version of Ubuntu you are using. You can use the search functionality at the top of the page to find a more recent version.

Introduction

Flexible searching and indexing for web applications and sites is almost always useful and sometimes absolutely essential. While there are many complex solutions that manage data and allow you to retrieve and interact with it through HTTP methods, ElasticSearch has gained popularity due to its easy configuration and incredible malleability.

In this article, we will be discussing how to interact with ElasticSearch in order to utilize it for your specific needs. We will be demonstrating this on an Ubuntu 12.04 VPS.

To follow along, you must install ElasticSearch using this tutorial. We will assume that you installed ElasticSearch using the latest .deb file provided by the project’s site.

How ElasticSearch Works

Before delving into the nuts and bolts of ElasticSearch usage, it is important to know how the software implements its solutions.

ElasticSearch provides a RESTful API that you can interact with in a variety of ways. This basically is a type of interface that describes the way a client and server interact. If a server is said to be “RESTful”, it provides a way to interact through common HTTP methods (GET, POST, PUT, DELETE), and does not maintain state information. Each request is independent and resources are returned in common text formats like JSON.

ElasticSearch exposes an API that allows you to interact with data using the HTTP verbs, and passes parameters and information through the use of URI components. This means that it stores data based on how the URI is formatted.

In a typical ElasticSearch interaction, you specify the operation you are using to determine what kind of action to perform on the data that follows. To retrieve information, you may use a GET command. To create or update records, you can use a PUT or POST command.

Implementing CRUD Methods

CRUD stands for create, read, update, and delete. These are all operations that are needed to effectively administer persistent data storage. Luckily, these also have logical equivalents in HTTP methods, which makes it easy to interact using standard methods. The CRUD methods are implemented by the HTTP methods POST, GET, PUT, and DELETE respectively.

ElasticSearch provides API access that can perform all of these functions. We will discuss ElasticSearch in terms of how to do these types of operations.

For ease of explanation, we will use curl to demonstrate, since you can explicitly state the HTTP method and you can easily interact with ElasticSearch from your terminal session.

ElasticSearch can be accessed by default by going to port 9200 on the server.

Create Content

Object creation in ElasticSearch is usually referred to as indexing. This is simply the process of adding data to the store and deciding on categories. We can create objects in ElasticSearch using the HTTP methods PUT or POST.

In its simplest form, you can specify the index to post the data to, the type of object being stored, and the id of the object you are storing. In general, this will look like this:

<pre> curl -X PUT http://server_name.com:9200/<span class=“highlight”>index</span>/<span class=“highlight”>type</span>/<span class=“highlight”>object_id</span> -d ‘{ <document data> }’ </pre>

In this scenario, the index can be thought of as a database. It is the top-level organizational unit that can separate some information from others. You may wish to use this to separate application or site data, for instance.

The type is an arbitrary type categorization. It can be anything, but can be used to group certain information together. For instance, you could have a data type of “users” or “documents”.

The id can be specified if you are creating it using a PUT command, or left off to be auto-generated if you choose to use a POST command.

After the -d flag, you pass in the object that you would like to store. This is a JSON-like object with a flexible format. The categories are created on the fly, so you can specify anything you’d like.

For instance, if we are building an index for our playground equipment, we can index a slide like this:

curl -XPUT "http://localhost:9200/playground/equipment/1" -d ' { "type": "slide", "quantity": 2 }'

You should receive a response back indicating that the operation was successful:

{"ok":true,"_index":"playground","_type":"equipment","_id":"1","_version":1}

As you can see, this basically tells you that it was indexed with the information you provided. You may also notice that your information is versioned. This can be useful for querying later on.

Because the PUT command can be used to update information as well as create it, you may want to specify that you are wanting to create a record, and not update one that already exists. To do this, just add /_create to the end of your URI:

curl -XPUT "http://localhost:9200/playground/equipment/1/_create" -d '{ "type": "slide", "quantity": 2 }'

This will cause the API call to return an error if the document already exists:

{"error":"DocumentAlreadyExistsException[[playground][2] [equipment][1]: document already exists]","status":409}

Read Content

The entire point of adding data to index is to gain the ability to retrieve that data at a later point in time. We access the API using the HTTP GET method to read content.

The most basic way to retrieve data is by specifying the exact object by the index, type, and id:

curl -XGET "http://localhost:9200/playground/equipment/1"

This will return the information about the document that we received when we were creating the object. Appended onto this output is a key called _source that contains the document itself as its value:

{"_index":"playground","_type":"equipment","_id":"1","_version":2,"exists":true, "_source" :  { "type": "slide", "quantity": 1 }}

For our purposes, we probably want the output more human-friendly, so we can append ?pretty onto the end of the request:

curl -XGET "http://localhost:9200/playground/equipment/1?pretty"

{
    "_index" : "playground",
    "_type" : "equipment",
    "_id" : "1",
    "_version" : 2,
    "exists" : true, "_source" : { "type": "slide", "quantity": 1 }
}

Often, we will only want the document itself to be returned. We can do this by appending /_source to the query string:

curl -XGET "http://localhost:9200/playground/equipment/1/_source?pretty"

{ "type": "slide", "quantity": 1 }

If we would like to return only specific fields, we can do so by adding on a ?fields=<field1>,<field1> onto the end of the query:

curl -XGET "http://localhost:9200/playground/equipment/1?fields=type"

This will add a add a key called fields that contains a JSON object with the filtered _source:

{"_index":"playground","_type":"equipment","_id":"1","_version":2,"exists":true,"fields":{"type":"slide"}}

Update Content

We can easily update content by using the HTTP POST command. This will allow us to modify data using in-line scripting and parameters. The other option is to just use a PUT command and basically “re-create” the object with a replacement (updated) object.

We can update a data object using the POST command and appending /_update to the URI. We pass it an object with the keys script and params. It can reference the internally defined keys in the document by using the prefix ctx._source.

For instance, to update the quantity of slides on our playground, we could type something like this:

curl -XPOST "http://localhost:9200/playground/equipment/1/_update" -d '{ "script": "ctx._source.quantity += step", "params": { "step": 1 } }'

If we read the data after we have done this operation, we can see that the “quantity” has been incremented:

curl -XGET "http://localhost:9200/playground/equipment/1/_source"

{"type":"slide","quantity":3}

We can add arbitrary new fields by simply updating the object and assigning a value to a new key:

curl -XPOST "http://localhost:9200/playground/equipment/1/_update" -d '{ "script": "ctx._source.name_of_new_key = \"value of new field\"" }'

Note that you may have to escape some quotation marks to avoid confusing the engine.

We can also remove fields by calling the .remove method of the source and passing the field name:

curl -XPOST "http://localhost:9200/playground/equipment/1/_update" -d '{ "script": "ctx._source.remove(\"name_of_new_key\")" }'

If passing a script is too complex for the operations you are trying to do, you can also simply reference the field that you want to update and pass in the new value by using the “doc” key instead of the “script” of the previous examples:

curl -XPOST "http://localhost:9200/playground/equipment/1/_update" -d '{ "doc" : { "type": "swing" } }'

Now, we can find the document again, and its “type” had been replaced with “swing”:

curl -XGET "http://localhost:9200/playground/equipment/1/_source"

{"type":"swing","quantity":3}

Delete Content

The delete operation in ElasticSearch is rather straight forward. It simply deletes a document with the matching ID. We can reach this through HTTP by using the DELETE command.

For instance, to delete a document in our playground index with the ID of 36, we can use this command:

curl -XDELETE "http://localhost:9200/playground/equipment/36"

{"ok":true,"found":true,"_index":"playground","_type":"equipment","_id":"36","_version":2}

This indicates that the operation was successful. If there was no object that matched the command, you would instead get a response that looked like this:

{"ok":true,"found":false,"_index":"playground","_type":"equipment","_id":"36","_version":1}

The “found” key is the one that tells if the matching object was found and processed.

Search for Content

We can search for our objects by passing the /_search URI component. This can be used after the server itself, after the index, or after the type, depending on the realm that you would like to search.

For instance, to search everything on the server for a quantity of “4”, we can use a search string like this:

curl -XGET "http://localhost:9200/_search?q=quantity:4'

{"took":5,"timed_out":false,"_shards":{"total":25,"successful":25,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"playground","_type":"equipment","_id":"1","_score":1.0, "_source" : {"type":"slide","quantity":4}}]}}

This is another area, where passing the “pretty” parameter can be helpful:

curl -XGET "http://localhost:9200/_search?q=quantity:4&pretty"

{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 25,
    "successful" : 25,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "playground",
      "_type" : "equipment",
      "_id" : "1",
      "_score" : 1.0, "_source" : {"type":"slide","quantity":4}
    } ]
  }
}

If you wished to search only in playground equipment and exclude other areas, you could change the search domain by including the index and type in the URI:

curl -XGET "http://localhost:9200/playground/equipment/_search?q=quantity:4&pretty"

If you want to specify the type, but exclude the index, you can substitute the index with _all:

curl -XGET "http://localhost:9200/_all/equipment/_search?q=quantity:4&pretty"

You can also search by using the ElasticSearch domain specific language, which is passed in as a document that looks a lot like standard JSON:

curl -XGET "http://localhost:9200/playground/equipment/_search" -d '{ "query": { "term": { "type": "slide" } } }'

There are many more options for searching and filtering the results, but this provides a good basis for how to retrieve data that you have stored.

Conclusion

ElasticSearch is a flexible solution that can index text objects easily and dynamically. If you can programmatically index content, then this solution is almost boundless in terms of its ability to generate search results that your applications or users can utilize.

While this article introduces you to some basic concepts of how to use ElasticSearch, the way you interact with the search through your programs or sites is entirely up to you. You can utilize this as a tool to quickly get some basic search functionality, or you can create elaborate object indexing that will allow you to access your data in an extremely fine-grained way. ElasticSearch is simply an engine that you can hook up to serve your application needs.

<div class=“author”>By Justin Ellingwood</div>

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author

Justin Ellingwood

Author

See author profile

Former Senior Technical Writer at DigitalOcean, specializing in DevOps topics across multiple Linux distributions, including Ubuntu 18.04, 20.04, 22.04, as well as Debian 10 and 11.

Category:

Tags: