Trifork Blog

AngularJS training

Introducing the elasticshell

March 6th, 2013 by
| Reply

elasticshell
A few days ago I released the first beta version of the elasticshell, a shell for elasticsearch. The idea I had was to create a command line tool that allows you to easily interact with elasticsearch.

Isn't elasticsearch easy enough already?
I really do think elasticsearch is already great and really easy to use. However, on the other hand there is quite some API available and quite some json involved too. Also, interacting with REST APIs requires a tool other than the browser to use the proper http methods and so on. There are different solutions available: some of them are generic, like curl or browser plugins, while others are elasticsearch plugins like head or sense, that you can use to send json requests and see the result, still in json format. What was missing is a command line tool, something that plays the role of the mongo shell in the elasticsearch world. That's ambitious, isn't it?

In the meantime the es2unix tool has been released by Drew, a member of the elasticsearch team. The interesting approach taken there is to hide all the json and show only text in a nice tabular format, providing an executable command that makes possible to pipe its output to other unix commands like grep, sort and awk. That's a great idea, and an even greater result I must say.

A json friendly environment
I decided to take another approach: provide an environment that makes it easier to play around with all that json. That's why I started writing a javascript shell, where json is native and it's relatively easy to provide auto-suggestions directly within json objects. I also wanted to use the elasticsearch Java API, which are complete, performant, and powerful, allowing to even fire a new node if needed.

Let's have a look at it
The result is a bit of a monster! I used the Rhino engine to be able to execute Javascript code on top of the JVM. This means the elasticshell is written in Java, but supports both Javascript and Java commands as input. The project is hosted on github and there are also two binary releases available: one that works with elasticsearch 0.20.x and one with 0.19.x. I also plan to release soon a new version that works with the brand new 0.90 released a few days ago.
Once you've downloaded the zip file, you can extract it and run the executable under the bin folder: elasticshell on unix or elasticshell.bat on windows. If you have an elasticsearch instance running locally on the default transport port (9300), the elasticshell will automatically connect to it at startup. As a result there will effectively be a transport client available in the shell with name es.

$ cd elasticshell-0.20.5-BETA
$ ./bin/elasticshell
Welcome to the elasticshell
----------------------------------
Transport client connected to [inet[localhost/127.0.0.1:9300]] registered with name es
>

Wait a second, what is a transport client?
A transport client is the most common elasticsearch client created using its Java API. It won't send REST requests to elasticsearch but it will use the internal transport (rather than the http transport) to communicate with one or more nodes in round robin fashion.

If your elasticsearch instance is running on a different host or port, you won't see the last message "Transport client connected..." at startup, therefore you'll need to manually connect to a running cluster. You can easily create a new transport client using the transportClient command, which is in fact a javascript function, like all java commands within the shell. The es variable name is just a convention, you can use whatever name you prefer, as in the following example.

> var client = transportClient('hostname:9301');

We can join the cluster too
What's even more interesting using the Java API is that you can create a node client too if you want. Using a node client you are effectively joining the cluster, creating a lightweight node that won't hold any data nor become master. Your client node will hold the cluster state and know how the data is distributed over the cluster, thus it will send the requests directly where they are supposed to go and execute the reduce phase of the searches locally. You can create a node client using the nodeClient command, just providing the name of the cluster that you want to join. This time the zen discovery will be used to detect the other nodes and join the cluster.

> var nodeClient = nodeClient('elasticsearch');
Creating new node client.........
Node client connected to cluster [elasticsearch]

We are connected, now what? Let the elasticshell suggest you what you can do!
Once we have a client connected, either a transport client or a node client, we can actually start to interact with elasticsearch. All the elasticsearch API are exposed to the shell, but you don't have to remember anything by heart, you can make use of the available auto-suggestions to lookup the methods exposed by any object. Let's look at what we can do with the es client that we previously created, just typing es. and using the tab key to ask for the auto-suggestions:

> es.
availableIndices()       availableNodes()         builder
bulk()                   bulkBuilder()            close()
clusterApi()             count()                  countBuilder()
delete()                 deleteBuilder()          deleteByQuery()
deleteByQueryBuilder()   equals()                 explain()
explainBuilder()         get()                    getBuilder()
getClass()               index()                  indexBuilder()
indicesApi()             moreLikeThis()           moreLikeThisBuilder()
multiGet()               multiGetBuilder()        multiSearch()
multiSearchBuilder()     percolate()              percolateBuilder()
search()                 searchBuilder()          toString()
update()                 updateBuilder()          validate()
validateBuilder()

The first two methods, availableIndices and availableNodes, are just shortcuts to have a look at the indices and nodes available in the cluster. They use the cluster state API to retrieve that information and display only a small part of its response.

> es.availableNodes();
{
  "1uZwi-eBRKapORbseukqUQ": {
    "name": "Caregiver"
  },
  "PLrh7S7NT_O1z589O65oWw": {
    "name": "elasticshell"
  },
  "EiC0meI9TbenGthWg2DR7Q": {
    "name": "Eon"
  }
}

As you can see from the above output, my cluster is composed out of three nodes: Caregiver, elasticshell and Eon. Wait a second, elasticshell? Yes, because I'm using a node client. This wouldn't happen using the default transport client. We see in the result both the name of the nodes and the internal ids assigned to them.

All the other functions provided through the es client are directly related to the elasticsearch core API. The indices API are exposed through the indicesApi() function:

> es.indicesApi().
aliasesGet()              aliasesGetBuilder()       aliasesUpdateBuilder()
analyzeBuilder()          clearCache()              clearCacheBuilder()
closeIndex()              closeIndexBuilder()       createIndex()
createIndexBuilder()      deleteIndex()             deleteIndexBuilder()
equals()                  flushBuilder()            getClass()
indicesExists()           indicesExistsBuilder()    mappingDelete()
mappingDeleteBuilder()    mappingGet()              mappingGetBuilder()
mappingPut()              mappingPutBuilder()       openIndex()
openIndexBuilder()        optimize()                optimizeBuilder()
refresh()                 refreshBuilder()          segments()
segmentsBuilder()         settingsGet()             settingsGetBuilder()
settingsUpdate()          settingsUpdateBuilder()   stats()
statsBuilder()            status()                  statusBuilder()
templateDelete()          templateDeleteBuilder()   templateGet()
templateGetBuilder()      templatePut()             templatePutBuilder()
toString()                typesExists()             typesExistsBuilder()
warmerDelete()            warmerDeleteBuilder()     warmerGet()
warmerGetBuilder()        warmerPut()               warmerPutBuilder()

And the cluster API are exposed through the clusterApi() function:

> es.clusterApi().
clusterHealth()                  clusterHealthBuilder()
clusterReroute()                 clusterRerouteBuilder()
clusterSettingsGet()             clusterSettingsGetBuilder()
clusterSettingsUpdate()          clusterSettingsUpdateBuilder()
clusterState()                   clusterStateBuilder()
equals()                         getClass()
nodesHotThreads()                nodesHotThreadsBuilder()
nodesInfo()                      nodesInfoBuilder()
nodesRestart()                   nodesRestartBuilder()
nodesShutdown()                  nodesShutdownBuilder()
nodesStats()                     nodesStatsBuilder()
toString()

To build or not to build?
As you can see from the auto-suggestions, the client provides two methods for each API, one with exactly the name of the API itself (e.g. index for the index API), and another one which ends with Builder (e.g. indexBuilder for the index API).

The first one allows to quickly make use of the API providing the basic parameters needed. For example, the index method will require the few parameters that are needed in order to index a document: index, type, id, and the document itself. Let's see then an example of how we can index this blogpost.

> var blogpost = {
...   "title": "Introducing the elasticshell",
...   "author": "Luca Cavanna",
...   "content": "A few days ago I released the first beta version of the elasticshell, a shell for elasticsearch."
... }
> es.index('blog','trifork','1',blogpost);
{
  "ok": true,
  "_index": "blog",
  "_type": "trifork",
  "_id": "1",
  "_version": 1
}

Done! The important thing to notice here is that the json document is simply provided as a json object, which is native within the shell. Also, the input doesn't need to be on a single line; you can write commands and json in multiple lines, and the shell is able to detect whether it needs to wait for more input to complete the command or execute it when ready. Also, you can assign the result of any command to a variable: that's a json object too and you can play around with it.

We need way more flexibility than this!
This first way was fast but what if we need to provide more parameters, like the routing parameter or more? We need a more flexible way to use the API, which is provided by the methods that end with "Builder". They allow to build the request and execute it straightaway. Let's see an example reusing the blogpost that we previously indexed. By the way, did I mention that we have auto-suggestions available within the json objects too? Here it is:

> blogpost.
author    content   title

Let's go ahead preparing another blogpost:

> blogpost.title="What's so cool about elasticsearch?";
What's so cool about elasticsearch?
> blogpost.content="I thought it might be worthwhile sharing my own answer in this blog."
I thought it might be worthwhile sharing my own answer in this blog.
> blogpost
{
  "title": "What's so cool about elasticsearch?",
  "author": "Luca Cavanna",
  "content": "I thought it might be worthwhile sharing my own answer in this blog."
}

Up until now we only modified the previously created blogpost and checked the final result, let's now index it. Again, you can use the auto-suggestions to know which methods are provided by the request builders. The builders use the fluent interface, that means every method that sets a value returns the builder itself.

> es.indexBuilder().
consistencyLevel()   create()             equals()
execute()            getClass()           id()
index()              opType()             parent()
percolate()          refresh()            replicationType()
routing()            source()             timeout()
timestamp()          toString()           ttl()
type()               version()            versionType()
> es.indexBuilder().index('blog').type('trifork').id('2').source(blogpost).execute();
{
  "ok": true,
  "_index": "blog",
  "_type": "trifork",
  "_id": "2",
  "_version": 1
}

Cool! I'd personally recommend to use the builders most of the time. In fact, they are more user friendly, since you don't even need to remember the parameters and the order needed. Also, before its execution any request is validated, and if there are missing parameters you'll get back an error like in the example below. The request validation is a mechanism provided by elasticsearch that the elasticshell utilizes.

> es.indexBuilder().index('blog').type('trifork').id('3').execute()
line 1: Wrapped org.elasticsearch.action.ActionRequestValidationException: Validation Failed: 1: source is missing;

Now that we indexed a couple of documents and created the blog index, let's make one step back and see how we can find out what indices are available over the cluster. The first way is the already mentioned availableIndices function:

> es.availableIndices()
{
  "blog": {
    "types": [
      "trifork"
    ]
  }
}

But there's an even easier way. Let's see what our es client object exposes now, starting with the letter b:

> es.b
blog            builder         bulk()          bulkBuilder()

What's "blog" now? It wasn't there before... must be the index we've created. Yes! Existing indices and types are made available within the shell and expose index/type specific operation. Let's find out more using again the auto-suggestions:

> es.blog.
aliasesGet()              aliasesGetBuilder()       analyzeBuilder()
builder                   clearCache()              clearCacheBuilder()
closeIndex()              closeIndexBuilder()       clusterHealth()
clusterHealthBuilder()    clusterState()            clusterStateBuilder()
count()                   countBuilder()            delete()
deleteBuilder()           deleteByQuery()           deleteByQueryBuilder()
deleteIndex()             deleteIndexBuilder()      equals()
explain()                 explainBuilder()          flush()
flushBuilder()            get()                     getBuilder()
getClass()                index()                   indexBuilder()
indexDetails()            mappingDelete()           mappingDeleteBuilder()
mappingGet()              mappingGetBuilder()       mappingPut()
mappingPutBuilder()       moreLikeThis()            moreLikeThisBuilder()
multiGet()                multiSearch()             openIndex()
openIndexBuilder()        optimize()                optimizeBuilder()
percolate()               percolateBuilder()        refresh()
refreshBuilder()          search()                  searchBuilder()
segments()                segmentsBuilder()         settingsGet()
settingsGetBuilder()      settingsUpdate()          settingsUpdateBuilder()
stats()                   statsBuilder()            status()
statusBuilder()           toString()                trifork
typesExists()             typesExistsBuilder()      update()
updateBuilder()           validate()                validateBuilder()
warmerDelete()            warmerDeleteBuilder()     warmerGet()
warmerGetBuilder()        warmerPut()

Wow, those are all the operations that we can execute on a specific index. They are either core API or indices API. And among them we can also find "trifork". That's the type we've used to index our documents. Let's find out more:

> es.blog.trifork.
builder                  count()                  countBuilder()
delete()                 deleteBuilder()          deleteByQuery()
deleteByQueryBuilder()   equals()                 explain()
explainBuilder()         get()                    getBuilder()
getClass()               index()                  indexBuilder()
mappingDelete()          mappingDeleteBuilder()   mappingGet()
mappingGetBuilder()      mappingPut()             mappingPutBuilder()
moreLikeThis()           moreLikeThisBuilder()    multiGet()
multiSearch()            percolate()              percolateBuilder()
search()                 searchBuilder()          toString()
update()                 updateBuilder()          validate()
validateBuilder()        warmerPut()

Here we go, now we can see all the operations that can be executed on a specific type. Let's use the get API to get back one of the blogposts that we indexed:

> es.blog.trifork.get('1');
{
  "_index": "blog",
  "_type": "trifork",
  "_id": "1",
  "_version": 1,
  "exists": true,
  "_source": {
    "title": "Introducing the elasticshell",
    "author": "Luca Cavanna",
    "content": "A few days ago I released the first beta version of the elasticshell, a shell for elasticsearch."
  }
}

That's all for now: but no fear to be continued...
With this introduction I showed you some basic things that you can do with the elasticshell and how powerful and flexible it is. Hopefully I've convinced you to download it and give it a try!
In the next article we'll see how you can execute a search with it and so on. More articles and documentation will come too, also containing some advanced examples using either custom Javascript functions or Java code. The released version is still a BETA; should you find any problem, I'd kindly ask you to open an issue here on github.
But I'm also curious about your opinions: what do you think about it? What are the features that you would like to see implemented next? Just drop a comment if you have any idea, I'm looking forward to receiving your feedback.

One Response

  1. April 18, 2013 at 15:55 by javanna.net - ...not just java

    [...] recently wrote a couple of articles about the elasticshell, the command line shell for Elasticsearch that I created. If you [...]

Leave a Reply