AWS Elasticsearch Service Woes

At the beginning of the year we moved over to Amazon’s Web Service’s Elasticsearch Service from our implementation of Elasticsearch hosted in EC2. The idea behind this was so we could utilise the service and abstract the operational work like maintaining the instances.

We kept coming across OOM issues due the JVMMemoryPresure spiking and inturn the ES service kept crapping out. Aside from some optimisation work, we’d more than likely have to add more boxes/resources to the cluster which then means more things to manage. This is when we thought, “Hey, AWS have a service for this right? Lets’s give that a crack?!”.

As great as having it as a service is, it certainly comes with some fairly irritating pitfalls which then causes you to approach the situation from a different angle.


Shard Management#

By default AWS assigns 5 primary shards and 1 replica shard per index meaning each index was creating 10 active shards. So if you create multiple daily indices, lets say 5 indices per day, that’s going to be 10 * 5 = 50. That’s right folks, 50 bloody shards, per day.

Even with some sort of cleaning via a tool like curator, the cluster is going to generate a lot of shards.

Having spoken to some talented colleagues of mine I was recommended as a rule of thumb that you’d want to keep your shards to around 10GB and under. To do this you’d then set the number of shards globally in elasticsearch.yaml, however you can’t do this with the AWS ES service. Nor can you do this via the API, as that has been restricted too, as a GET /_cluster/settings on the cluster will return:

{
"Message": "Your request: '/_cluster/settings' is not allowed for verb: GET"
}

AWS restricts all global configuration for the cluster as they expect all config to be done via their configure cluster section, which makes sense in all honesty, considering you should be treating it like a service and relinquish that granular control. Much to my annoyance.

To remedy this I implemented index templates.

Index Templates#

Index templates allow to define templates that will automatically be applied to new indices created. The templates include both settings and mappings, and a simple pattern template that controls if the template will be applied to the index created

Using index templates you are able to specify how many shards are assigned per index e.g. 1 primary shard and 1 replica shard so in turn a single index would generate a total of 2 active shards.

Here are some examples of index templates:

curl -XPUT https://<elasticsearch>/_template/template_1 -d '
{
    "template" : "*",
    "order" : 0,
    "settings" : {
        "number_of_shards" : 2,
        "number_of_replicas" : 1,
    }
}

curl -XPUT https://<elasticsearch>/_template/template_2 -d '
{
    "template" : "logs-*",
    "order" : 1,
    "settings" : {
        "number_of_shards" : 1,
        "number_of_replicas" : 1,
    }
}

The template field will apply to any index that matches that regex.

When applying multiple templates you can specify the ordering as seen above.

The order of the merging can be controlled using the order parameter, with lower order being applied first, and higher orders overriding them

You are also able to view all active templates if you need to see what’s in play:

curl -XGET 'https://<elasticsearch>/_template'

Indices#

On top of the index templates we then also looked at how we index our logs via logstash, for example we had some daily indices that were only 10MB in size. Now, I don’t know about you but having 2 active shards for just 10MB of data seems a bit silly.

We then started grouping the smaller indices into weekly and monthly ones so in turn fewer shards are created. This greatly helped reduce the number of shards seeing as 7 indices would generate 14 active shards per week, while 1 indice will only create 2 active shards per week. Good times.

Reindexing#

Unfortunately it’s not possible to retrospectively reduce the number of shards assigned per index once the index template’s take effect, to resolve this you will need to reindex your indices.

Rather annoyingly, at the time of writing this the AWS ES version is 1.5.2 which means I couldn’t use the Reindex API, or use logstash as there is no amazon_es input plugin only an output plugin.

I started playing around with elasticsearch-py and decided to build a tool that will help me reindex the offending indices.

I’ve rather unimaginatvely named it es-tool, and it’s super basic but it’s done the job for me so far. So far it only reindexes all documents in an index and allows me to delete indices too. Here’s an example of how you would use it to reindex:

./es-tool.py --elasticsearch http://elasticsearch --reindex name-of-index --new_index_name name-of-new-index

API#

The Elasticsearch API and I have become great friends over the past few weeks while I’ve been rejigging the ES cluster. There’s a handful number of varying APIs e.g. Document, Search, Indices, cat and Cluster, that give you the ability to modify and view all kinds of details of the cluster.

The REST APIs are exposed using JSON over HTTP, so as you can see above, the PUT method requires the index template to be in json format, while all GET methods will return the output in json.


Resizing#

As with every ES cluster you need to plan how much resources you will need and scale up/down accordingly. The positive to using the AWS ES service is that scaling your cluster is just a couple clicks worth of work and it does the rest for you. You can find the prices for the various AWS ES instances here as well as EBS or provisioned IOPS volumes if you decided to not use the instance volumes.

One thing to note is that when you do resize the cluster it over provisions instances to allow the moving of shards between the new and the old nodes. The cluster state may change from green to yellow or red during this time, and this is when you have to keep an eye on the status of the shards. If the cluster is unable to allocate shards it may get itself in a tizz and throw a shit fit. I’ve ended up having to contact AWS support to recover the cluster before for this exact reason. Lowering the number of shards helps considerably in regards to the allocation of shards.

Another thing to be aware of is to make sure the cluster config you have set has kicked in before you make another change, otherwise you run the risk of the cluster becoming unresponsive and unavailable. For example make sure the number of nodes is as expected and the shards are all assigned.

You can status of the shards via the API:

curl -XGET 'https://<elasticsearch>/_cat/shards'

You can also see all the node information too:

curl -XGET 'http://<elasticsearch>/_nodes'

Conclusion#

In hindsight I think it may have been worth potentially sticking with and fleshing out the old implementation of Elasticsearch, instead of having to fudge various things with the AWS ES service. On the other hand it has relieved some of the operational overhead, and in terms of scaling I am literally a couple of clicks away.

If you have large amounts of data you pump into Elasticsearch and you require granular control, AWS ES is not the solution for you. However if you need a quick and simple Elasticsearch and Kibana solution, then look no further.