If you’re running a multi-node Elasticsearch cluster checkout Automattic’s fork of the Elasticsearch StatsD Plugin for pushing cluster and node metrics to StatsD.
At Automattic we’ve been using a set of Munin scripts to collect and aggregate Elasticsearch metrics via its native node & stats REST APIs. This method works relatively well giving us enough longitudinal information about the cluster and nodes to diagnose issues or test optimizations. That said, cluster monitoring with Munin at a 5 minute resolution leaves a lot to be desired.
First and foremost, our Elasticsearch cluster is spread across 3 data centers each with it’s own Munin instance. This makes collecting and aggregating even simple metrics like cluster wide load quite difficult. In addition, due to the polling nature of metrics collection with Munin we are limited to a somewhat corse resolution of 5 minutes. While this is good enough for looking at time series data over the course of a day or week it’s quite time-consuming to wait 5 or 10 minutes for graphs to update when deploying changes or testing performance optimizations.
We’ve already had some good experience instrumenting our PHP stack with StatsD and building dashboards with Grafana so reusing that infrastructure for Elasticsearch metrics seems like a good fit. Our fearless leader Barry suggested we tryout the Elasticsearch StatsD Plugin from Swoop Inc. however upon closer inspection we found that it does not deal with clustered Elasticsearch environments well and have yet to be updated to work with ES 1.x. So over the past week we’ve forked and rewritten much of it to suit our needs.
The Automattic Elasticsearch StatsD Plugin is designed to run on all nodes of a cluster and push metrics to StatsD on a configurable interval. Once installed and configured each node will send system metrics (e.g. CPU / JVM / network / etc.) about itself. Data nodes can also be configured to send metrics about the portion of the index stored on itself. Finally, the elected master of the cluster is responsible for sending aggregate cluster metrics about the index (documents / indexing operations / cache sizes / etc.). By default the plugin will send the total cluster aggregate as well as per index metrics for indices however granularity can be configured to report down to the individual shard level if so desired.
Check it out on GitHub: Elasticsearch StatsD Plugin.