Poll Results: HBase Version Distribution

The results for HBase version distribution poll are in.  Thanks to everyone who took the time to vote!

The distribution pie chart is below, but we could summarize it as follows:

  • A big chunk of HBase clusters, about 30%, are still “stuck” on HBase 0.94.x
  • Over 37% of the HBase clusters are on 0.98.x that, until very recently, was the latest stable version
  • Only about 7% of clusters are on the 0.96.x and we can assume these clusters will soon migrate to either 0.98.x or 1.0.x
  • Somewhat surprisingly, almost 20% of HBase clusters are already on HBase 1.0.0 even though 1.0.0 was released only a few weeks ago

It’s great to see so many clusters moving to 1.0.0 so quickly! As for why there are still so many clusters using 0.94.x, which is several years old, see this comment on the HBase mailing list.  Here at Sematext we make heavy use of HBase and were on 0.94.x version for a long time, too.  A few months ago we’ve moved to 0.98.x and have been enjoying all its benefits.  Furthermore, we’ve recently updated SPM for HBase to monitor a pile of new HBase metrics that provide interesting new insights about our HBase clusters though some of the new metric charts.  For example, we are now able to see the dramatic impact of major compactions on data locality (and thus HBase performance!) — see for yourself – https://apps.sematext.com/spm-reports/s/VhOltU14Cy, or the number and size of HLog files over time — https://apps.sematext.com/spm-reports/s/7LU1qvs7ur.

HBase version distribution

Apache HBase Version Distribution

You may also want to check out the results of our other polls about big data technologies.

HBase Poll: Version You Run?

We are updating SPM for HBase to make sure SPM collects all the key HBase metrics that were added in 0.98, we thought it would be good to see which HBase versions are being used in the wild.  We’re on 0.98 after being on 0.94 for a long time.  How about you?

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

Poll Results: Kafka Version Distribution

The results for Apache Kafka version distribution poll are in.  Thanks to everyone who took the time to vote!

The distribution pie chart is below, but we could summarize it as follows:

  • Only about 5% of Kafka 0.7.x users didn’t indicate they will upgrade to 0.8.2.x in the next 2 months
  • Only about 14% of Kafka 0.8.1.x users didn’t indicate they will upgrade to 0.8.2.x in the next 2 months
  • Over 42% of Kafka users are already using 0.8.2.x!
  • Over 80% of Kafka users say they will be using 0.8.2.x within the next 2 months!

It’s great to see Kafka users being so quick to migrate to the latest version of Kafka!  We’re extra happy to see such quick 0.8.2 adoption because we put a lot of effort into improving Kafka metric, as well as making all 100+ Kafka metrics available via SPM Kafka 0.8.2 monitoring a few weeks ago, right after Kafka 0.8.2 was released.

Apache Kafka Version Distribution

Apache Kafka Version Distribution

 

You may also want to check out the results of our recent Kafka Producer/Consumer language poll.

 

Kafka Poll: Version You Use?

UPDATE: Poll Results!

With Kafka 0.8.2 and 0.8.2.1 being released and with the updated SPM for Kafka monitoring over 100 Kafka metrics, we thought it would be good to see which Kafka versions are being used in the wild.  Kafka 0.7.x was a strong and stable release used by many.  The 0.8.1.x release has been out since March 2014.  Kafka 0.8.2.x has been out for just a little while, but…. are there any people who are either already using it (we are!) or are about to upgrade to it? Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

Hiring: Full-stack Java Developers

Sematext is looking for a strong full-stack developers (remote work is cool!) who:

  • Find creative and elegant solutions, build tools, avoid repetition and boilerplate code
  • Take ownership and push forward; want to help build the team and the organization
  • Like working with data-intense applications, continuous data streams (e.g. metrics, logs, events), visualization and data analytics
  • Want to have fun, enjoy building new things and improving existing ones

Some info about our tech:

  • Java in the backend, with a bit of Akka
  • Java and NodeJS for various SPM agents
  • A series of Machine Learning algorithms for Anomaly Detection
  • HBase and Elasticsearch for storing massive volumes of data (many billions of “rows”… stopped counting long ago)
  • Jetty and Kafka that handle hundreds of thousands of events/metrics/logs/messages per second
  • MySQL, ZooKeeper (obviously)
  • Apache Flume, rsyslog, Logstash, and Kibana
  • Lots of JavaScript in the Boostrap-based UI layer – jQuery and various other usual suspects
  • Flot for charting (and looking to replace that with something more modern, yes)
  • Solr, but just for search-lucene.com and search-hadoop.com, which are not related to this opening
  • Everything runs on AWS – we own a total of 2 physical servers

Products / Services you’d be building:

  • SPM – monitoring, alerting, anomaly detection.  A lot has been done, and a lot more is in the queue.
  • Logsene – log collection, indexing, searching, alerting, anomaly detection.  A lot of new features are waiting to be built.
  • Search Analytics – it works, it runs, it has customers, but there is so much more value we can extract from query and click data!
  • NewAppHere – can’t talk about it, but it’s going to be big… and not only in Japan

Good to have skills:

  • Java – for various backend components
  • JavaScript (frameworks/libraries) – if you’re truly full-stack
  • We can teach you or at least help you catch up quickly with everything else mentioned above as well as custom bits not mentioned here

A bit about Sematext:

  • HQ in NYC, with people in North America, Europe, and Asia
  • Developers with strong open-source backgrounds
  • Deep expertise in Solr and Elasticsearch – we are also a leading provider of consulting and support for tons of clients
  • Some of our engineers give talks at conferences around the world and write books
  • We are totally self-funded, financially independent, and profitable
  • Chirping via @sematext

Got more questions?  Send them our way!  Better yet, send your resume!

Poll Results: Kafka Producer/Consumer

About 10 days ago we ran a a poll about which languages/APIs people use when writing their Apache Kafka Producers and Consumers.  See Kafka Poll: Producer & Consumer Client.  We collected 130 votes so far.  The results were actually somewhat surprising!  Let’s share the numbers first!

Kafka Producer/Consumer Languages

Kafka Producer/Consumer Languages

What do you think?  Is that the breakdown you expected?  Here is what surprised us:

  • Java is the dominant language on the planet today, but less than 50% people use it with Kafka! Read: possible explanation for Java & Kafka.
  • Python is clearly popular and gaining in popularity, but at 13% it looks like it’s extra popular in Kafka context.
  • Go at 10-11% seems quite popular for a relatively young language.  One might expect Ruby to have more adoption here than Go because Ruby has been around much longer.
  • We put C/C++ in the poll because these languages are still in use, though we didn’t expect it to get 6% of votes.  However, considering C/C++ are still quite heavily used generally speaking, that’s actually a pretty low percentage.
  • JavaScript and NodeJS are surprisingly low at just 4%.  Any idea why?  Is the JavaScript Kafka API not up to date or bad or ….?
  • The “Other” category is relatively big, at a bit over 12%.  Did we forget some major languages people often use with Kafka?  Scala?  See info about the Kafka Scala API here.

Everyone and their cousin is using Kafka nowadays, or at least that’s what it looks like from where we at Sematext sit.  However, because of the relatively high percentage of people using Python and Go, we’d venture to say Kafka adoption is much stronger among younger, smaller companies, where Python and Go are used more than “enterprise languages”, like Java, C#, and C/C++.

Follow

Get every new post delivered to your Inbox.

Join 155 other followers