Poll: Using SolrCloud or Not?

It’s been 9 months since we conducted a poll on SolrCloud usage.  A lot of things can change in 9 months.  SolrCloud itself went through a ton of development and bug fixing since our last poll.  It’s time to see how many of us are using SolrCloud now, at the end of 2013.

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.

ZooKeeper Poll Results

We’ve collected 50 votes in our ZooKeeper Usage Poll over the last few days.  Here are the results so far:

  • 66% of people use ZooKeeper directly
  • Another 16% use ZooKeeper indirectly
  • 18% do not use ZooKeeper at all

This puts total ZooKeeper usage at over 80%.  BUT:

Direct ZooKeeper usage being so high at 66% seems a little high and indirect usage being so low at 16% doesn’t feel quite right.  ZooKeeper is used by Hadoop, HBase, SolrCloud, Kafka, Storm, and a number of other popular distributed systems that one would think indirect usage would be much higher than direct usage.

What’s your take on these numbers?

Poll: Are You Using ZooKeeper?

In the last decade the world of distributed computing has exploded and Apache ZooKeeper is often at the center of it….which is why we just added ZooKeeper monitoring in SPM.  Let’s see what percentage of us use ZooKeeper.

Please tweet so we can collect a large number of votes and get a statistically representative sample.

Please tweet about Poll: Are you using ZooKeeper?

Poll Results: Hadoop YARN vs. pre-YARN

Back in April 2013 there was a poll in Hadoop Users LinkedIn group:

YARN or pre-YARN – which version of Hadoop are you using?

Because we were working on adding Hadoop monitoring to SPM, this was an important question for us – which version of Hadoop should SPM be able to monitor?

Here are the results of that poll:

Hadoop MRv1 vs. Hadoop YARN

Hadoop MRv1 vs. Hadoop YARN

As we can see, most Hadoop users are still using the old version of Hadoop and are not using YARN.  The percentage in the “YARN” bar at the top is partially hidden, but it’s 13% — only 13% of Hadoop users who responded are using Hadoop YARN.  But combine it with 17% of people who said they are moving to YARN, it’s 30% all together.  Still only about 1/2 of the total number of Hadoop MRv1 users, but if we asked that question in early 2014 we would likely see a close tie.

So which version of Hadoop are we supporting in SPM?  Both!  With SPM you can monitor both Hadoop MRv1 and Hadoop YARN.  And if you are using pre-YARN Hadoop today and want to switch to Hadoop YARN later, that’s not a problem for SPM.

Poll: Using SolrCloud or Not?

We know that as of February 2013, of those Solr users who follow Sematext Blog about 75% use one some version of Solr 4.x.  But today we are trying to get to another interesting stat:

What portion of Solr 4.x users use SolrCloud?

Let’s find out!  Please tweet this to help us get more votes and better stats.

Please vote only if you are using Solr 4.x.  Please do NOT vote if you are using 1.x or 3.x version of Solr.

Poll: Which Solr version are you using?

With Solr 4.1 recently released, let’s see which version(s) of Solr people are using.  Please tweet it to help us get more votes and better stats.

Poll: What do you use for Solr performance monitoring?

The results of this poll will be included in the “Large Scale ElasticSearch, Solr & HBase Performance Monitoring” presentation at Berlin Buzzwords next week.  Please vote and share this post to help us make this poll statistically significant!

Poll: What do you use for ElasticSearch performance monitoring?

The results of this poll will be included in the “Large Scale ElasticSearch, Solr & HBase Performance Monitoring” presentation at Berlin Buzzwords next week.  Please vote and share this post to help us make this poll statistically significant!

Poll: Solr Index Size Monitoring

As you may know, Sematext runs a service we internally call SPM – Scalable Performance Monitoring, a currently-still-free SaaS for monitoring performance of Solr, HBase, and soon a few other technologies we often help our clients with.  One of the things we monitor for Solr and other search technologies is the size of the index.  We monitor it by periodically checking its size, number of documents in it, number of deleted documents, number of index segments, files, etc.

Recently, we had an internal discussion about how to best report the index size when the index changes over time and decided we’d ask people who run Solr (or ElasticSearch or Sensei or…) – you – what you would like to see in this report.

For example, imagine that in some 5-minute time period (say 10:00 AM to 10:05 AM) we check the index 5 times (in reality we do it much for frequently) and each time we do that we find the index has a different number of documents in it: 10, 15, 20, 25, and finally 30 documents.  Now imagine this data as a graph showing the number of indexed document over time, but with the smallest  time period shown being a 5 minutes interval.

At this point the question we have for you is: How many documents should this graph report for our example 10:00 – 10:05 AM period above? Should it show the minimum – 10?  Average – 20?  Mean – 20?  Maximum -30?  Something else?  Minimum, average, and maximum – 10, 20, 30?

Any feedback and suggestions you give us regarding this will be greatly appreciated – thanks!

ElasticSearch Poll

We use ElasticSearch more and more here at Sematext (we have a number of ElasticSearch projects right at this moment, some of them quite massive in terms of data and/or query volume).  In our work we typically have only 1 ElasticSearch instance/only 1 JVM running ElasticSearch on each server in the cluster.

How about you?  Do you run multiple ElasticSearch instances/JVMs per box in production?

Thank you for your feedback!


Get every new post delivered to your Inbox.

Join 1,640 other followers