[Note: We’re holding a 2-day, hands-on Elasticsearch / ELK Stack training workshop in New York from October 19-20, 2015. Click here for details!]
Have you read the Top 10 Elasticsearch Metrics to Watch?
How about our free eBook – Elasticsearch Monitoring Essentials?
If you have, we’re impressed. If not, it’s great bedtime reading. ;)
Besides writing bedtime reading material, we also wrote some code last month and added a few new and useful Elasticsearch metrics to SPM. Specifically, we’ve added:
- Index Warmer metrics
- Thread Pool metrics
- Circuit Breaker metrics
So why are these important? Read on!
Warmers do what their name implies. They warm up. But what? Indices. Why? Because warming up an index means searches against it will be faster. Thus, one can warm up indices before exposing searches against them. If you come to Elasticsearch from Solr, this is equivalent to searcher warmup queries in Solr.
Elasticsearch nodes use a number of dedicated thread pools to handle different types of requests. For example, indexing requests are handled by a thread pool that is separate from the thread pool that handles search requests. This helps with better memory management, request prioritization, isolation, etc. There are over a dozen thread pools, and each of them exposes almost a dozen metrics.
Each pool also has a queue, which makes it possible to hold onto some requests instead of simply dropping them when a node is very busy. However, if your Elasticsearch cluster handles a lot of concurrent or slow requests, it may sometimes have to start rejecting requests if those thread pool queues are full. When that starts happening, you will want to know about it ASAP. Thus, you should pay close attention to thread pool metrics and may want to set Alerts and SPM’s Anomaly Detection Alerts on the metric that shows the number of rejection or queue size, so you can adjust queue size settings, or other parameters to avoid requests being rejected.
Alternatively, or perhaps additionally, you may want to feed your logs to Logsene. Elasticsearch can log request rejections (see an example below), so if your ship your Elasticsearch logs to Logsene, you’ll have both Elasticsearch metrics and its logs available for troubleshooting. Moreover, in Logsene you can create alert queries that alert you about anomalies in your logs, and such alert queries will alert you when Elasticsearch starts logging errors, like the example shown here:
o.es.c.u.c.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@5a805c60
Circuit Breakers are Elasticsearch’s attempt to control memory usage and prevent the dreaded OutOfMemoryError. There are currently two Circuit Breakers – one for Field Data, the other for Requests. In short, you can set limits for each of them and prevent excessive memory usage to avoid your cluster blowing up with OOME.
Want something like this for your Elasticsearch cluster?
Feel free to register here and enjoy all the SPM for Elasticsearch goodness. There’s no commitment and no credit card required. And, if you are a young startup, a small or non-profit organization, or an educational institution, ask us for a discount (see special pricing)!
Feedback & Questions
We are happy to answer questions or receive feedback – please drop us a line or get us @sematext.