ElasticSearch Poll
February 7, 2012 6 Comments
We use ElasticSearch more and more here at Sematext (we have a number of ElasticSearch projects right at this moment, some of them quite massive in terms of data and/or query volume). In our work we typically have only 1 ElasticSearch instance/only 1 JVM running ElasticSearch on each server in the cluster.
How about you? Do you run multiple ElasticSearch instances/JVMs per box in production?
Thank you for your feedback!





What’s the reasoning for running multiple instances? Is this is an effective way of avoid GC related issues on machines with a lot of memory? How big is the overhead from a processing perspective? Do you use dedicated disk / EBS drives for each instance to keep the IO performance predictable? How about network bandwidth – can multiple ES saturate that or this is not even a problem worth considering? Is the ideal number of instances related to the number of CPU cores? Is this really a good idea from a HA point of view? (you can end-up having all the replicas on a single machine) Is ES supporting replica placement policies (machine, rack awareness)?
Man Andrei, you sure you asked enough questions there?
Thanks for answering them. I hope that other readers will find them useful. Also I find it quite amazing that an important feature just got added a few hours ago.
Andrei: Indeed. However, another way to look at that important feature that got implemented just now is that it got implemented within 24 hours of us opening the issue. That’s hot.
Yep! Shay Banon is doing a fabulous job at driving the project forward!
The single reason for running more than one instance on a single machine is a machine with a lot of memory (north of 100gb) and possibly wanting to run several instances with smaller heap sizes (28-30gb, so compressed oops are stil in effect).
So, that feature that was added is not really that important (though I was happy to add it it was simple enough). I never heard of people needing to run more than 1 instance on a machine.