It’s been 9 months since we conducted a poll on SolrCloud usage. A lot of things can change in 9 months. SolrCloud itself went through a ton of development and bug fixing since our last poll. It’s time to see how many of us are using SolrCloud now, at the end of 2013.
Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.
We’ve collected 50 votes in our ZooKeeper Usage Poll over the last few days. Here are the results so far:
66% of people use ZooKeeper directly
Another 16% use ZooKeeper indirectly
18% do not use ZooKeeper at all
This puts total ZooKeeper usage at over 80%. BUT:
Direct ZooKeeper usage being so high at 66% seems a little high and indirect usage being so low at 16% doesn’t feel quite right. ZooKeeper is used by Hadoop, HBase, SolrCloud, Kafka, Storm, and a number of other popular distributed systems that one would think indirect usage would be much higher than direct usage.
In the last decade the world of distributed computing has exploded and Apache ZooKeeper is often at the center of it….which is why we just added ZooKeeper monitoring in SPM. Let’s see what percentage of us use ZooKeeper.
Please tweet so we can collect a large number of votes and get a statistically representative sample.
Back in April 2013 there was a poll in Hadoop Users LinkedIn group:
YARN or pre-YARN – which version of Hadoop are you using?
Because we were working on adding Hadoop monitoring to SPM, this was an important question for us – which version of Hadoop should SPM be able to monitor?
Here are the results of that poll:
Hadoop MRv1 vs. Hadoop YARN
As we can see, most Hadoop users are still using the old version of Hadoop and are not using YARN. The percentage in the “YARN” bar at the top is partially hidden, but it’s 13% — only 13% of Hadoop users who responded are using Hadoop YARN. But combine it with 17% of people who said they are moving to YARN, it’s 30% all together. Still only about 1/2 of the total number of Hadoop MRv1 users, but if we asked that question in early 2014 we would likely see a close tie.
So which version of Hadoop are we supporting in SPM? Both! With SPM you can monitor both Hadoop MRv1 and Hadoop YARN. And if you are using pre-YARN Hadoop today and want to switch to Hadoop YARN later, that’s not a problem for SPM.
As you may know, Sematext runs a service we internally call SPM – Scalable Performance Monitoring, a currently-still-free SaaS for monitoring performance of Solr, HBase, and soon a few other technologies we often help our clients with. One of the things we monitor for Solr and other search technologies is the size of the index. We monitor it by periodically checking its size, number of documents in it, number of deleted documents, number of index segments, files, etc.
Recently, we had an internal discussion about how to best report the index size when the index changes over time and decided we’d ask people who run Solr (or ElasticSearch or Sensei or…) – you – what you would like to see in this report.
For example, imagine that in some 5-minute time period (say 10:00 AM to 10:05 AM) we check the index 5 times (in reality we do it much for frequently) and each time we do that we find the index has a different number of documents in it: 10, 15, 20, 25, and finally 30 documents. Now imagine this data as a graph showing the number of indexed document over time, but with the smallest time period shown being a 5 minutes interval.
At this point the question we have for you is: How many documents should this graph report for our example 10:00 – 10:05 AM period above? Should it show the minimum – 10? Average – 20? Mean – 20? Maximum -30? Something else? Minimum, average, and maximum – 10, 20, 30?
Any feedback and suggestions you give us regarding this will be greatly appreciated – thanks!
We use ElasticSearch more and more here at Sematext (we have a number of ElasticSearch projects right at this moment, some of them quite massive in terms of data and/or query volume). In our work we typically have only 1 ElasticSearch instance/only 1 JVM running ElasticSearch on each server in the cluster.
How about you? Do you run multiple ElasticSearch instances/JVMs per box in production?