Poll Results: Log Shipping Formats

The results for the log shipping formats poll are in.  Thanks to everyone who took the time to vote!

The distribution pie chart is below, but we can summarize it for you here:

  • JSON won pretty handily with 31.7% of votes, which was not totally unexpected. If anything, we expected to see more people shipping logs in JSON.  One person pointed out GELF, but GELF is really just specific JSON structure over Syslog/HTTP, so GELF falls in this JSON bucket, too.
  • Plain-text / line-oriented log shipping is still popular, clocking in with 25.6% of votes.  It would be interesting to see how that will change in the next year or two.  Any guesses?  For those who are using Logstash for shipping line-oriented logs, but have to deal with occasional multi-line log events, such as exception stack traces, we’ve blogged about how to ship multi-line logs with Logstash.
  • Syslog RFC5424 (the newer one, with structured data in it) barely edged out its older brother, RFC3164 (unstructured data).  Did this surprise anyone?  Maybe people don’t care for structured logs as much as one might think?  Well, structure is important, as we’ll show later today in our Docker Logging webinar because without it you’re limited to mostly “supergrepping” your logs, not really getting insight based on more analytical type of queries on your logs.  That said, the two syslog formats together add up to 25%!  Talk about ancient specs holding their ground against newcomers!
  • There are still some of people out there who aren’t shipping logs! That’s a bit scary! :) Fortunately, there are lot of options available today, from the expensive On Premises Splunk or DIY ELK Stack, to the awesome Logsene, which is sort of like ELK Stack on steroids.  Look at log shipping info to see just how easy it is to get your logs off of your local disks, so you can stop grepping them.  If you can’t live without the console, you can always use logsene-cli!

Log_shipper_poll_4

Similarly, if your organization falls in the “Don’t ship them” camp (and maybe even “None of the above” as well, depending on what you are or are not doing) then — if you haven’t done so already — you should give some thought to trying a centralized logging service, whether running within your organization or a logging SaaS like Logsene, or at least DIY ELK.

Poll: How do you ship your Logs?

Recently, a few people from Sematext’s Logsene team debated about how useful the “structured” part of syslog logs (those using the RFC5424 format) is to people.  Or has shipping logs in other structured formats, like JSON, made RFC5424 irrelevant to most people?  Here is a quick poll to help us all get some insight into that.

NOTE: the question is not what your logs looks like initially when they are generated, but about what structure they have when you ship them to a centralized logging service, whether running within your organization or a logging SaaS like Logsene.

If you choose “None of the above” please do leave a comment to help spread knowledge about alternatives.

 

Poll Results: HBase Version Distribution

The results for HBase version distribution poll are in.  Thanks to everyone who took the time to vote!

The distribution pie chart is below, but we could summarize it as follows:

  • A big chunk of HBase clusters, about 30%, are still “stuck” on HBase 0.94.x
  • Over 37% of the HBase clusters are on 0.98.x that, until very recently, was the latest stable version
  • Only about 7% of clusters are on the 0.96.x and we can assume these clusters will soon migrate to either 0.98.x or 1.0.x
  • Somewhat surprisingly, almost 20% of HBase clusters are already on HBase 1.0.0 even though 1.0.0 was released only a few weeks ago

It’s great to see so many clusters moving to 1.0.0 so quickly! As for why there are still so many clusters using 0.94.x, which is several years old, see this comment on the HBase mailing list.  Here at Sematext we make heavy use of HBase and were on 0.94.x version for a long time, too.  A few months ago we’ve moved to 0.98.x and have been enjoying all its benefits.  Furthermore, we’ve recently updated SPM for HBase to monitor a pile of new HBase metrics that provide interesting new insights about our HBase clusters though some of the new metric charts.  For example, we are now able to see the dramatic impact of major compactions on data locality (and thus HBase performance!) — see for yourself – https://apps.sematext.com/spm-reports/s/VhOltU14Cy, or the number and size of HLog files over time — https://apps.sematext.com/spm-reports/s/7LU1qvs7ur.

HBase version distribution
Apache HBase Version Distribution

You may also want to check out the results of our other polls about big data technologies.

HBase Poll: Version You Run?

We are updating SPM for HBase to make sure SPM collects all the key HBase metrics that were added in 0.98, we thought it would be good to see which HBase versions are being used in the wild.  We’re on 0.98 after being on 0.94 for a long time.  How about you?

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

Poll Results: Kafka Version Distribution

The results for Apache Kafka version distribution poll are in.  Thanks to everyone who took the time to vote!

The distribution pie chart is below, but we could summarize it as follows:

  • Only about 5% of Kafka 0.7.x users didn’t indicate they will upgrade to 0.8.2.x in the next 2 months
  • Only about 14% of Kafka 0.8.1.x users didn’t indicate they will upgrade to 0.8.2.x in the next 2 months
  • Over 42% of Kafka users are already using 0.8.2.x!
  • Over 80% of Kafka users say they will be using 0.8.2.x within the next 2 months!

It’s great to see Kafka users being so quick to migrate to the latest version of Kafka!  We’re extra happy to see such quick 0.8.2 adoption because we put a lot of effort into improving Kafka metric, as well as making all 100+ Kafka metrics available via SPM Kafka 0.8.2 monitoring a few weeks ago, right after Kafka 0.8.2 was released.

Apache Kafka Version Distribution
Apache Kafka Version Distribution

 

You may also want to check out the results of our recent Kafka Producer/Consumer language poll.

 

Kafka Poll: Version You Use?

UPDATE: Poll Results!

With Kafka 0.8.2 and 0.8.2.1 being released and with the updated SPM for Kafka monitoring over 100 Kafka metrics, we thought it would be good to see which Kafka versions are being used in the wild.  Kafka 0.7.x was a strong and stable release used by many.  The 0.8.1.x release has been out since March 2014.  Kafka 0.8.2.x has been out for just a little while, but…. are there any people who are either already using it (we are!) or are about to upgrade to it? Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

Poll Results: Kafka Producer/Consumer

About 10 days ago we ran a a poll about which languages/APIs people use when writing their Apache Kafka Producers and Consumers.  See Kafka Poll: Producer & Consumer Client.  We collected 130 votes so far.  The results were actually somewhat surprising!  Let’s share the numbers first!

Kafka Producer/Consumer Languages
Kafka Producer/Consumer Languages

What do you think?  Is that the breakdown you expected?  Here is what surprised us:

  • Java is the dominant language on the planet today, but less than 50% people use it with Kafka! Read: possible explanation for Java & Kafka.
  • Python is clearly popular and gaining in popularity, but at 13% it looks like it’s extra popular in Kafka context.
  • Go at 10-11% seems quite popular for a relatively young language.  One might expect Ruby to have more adoption here than Go because Ruby has been around much longer.
  • We put C/C++ in the poll because these languages are still in use, though we didn’t expect it to get 6% of votes.  However, considering C/C++ are still quite heavily used generally speaking, that’s actually a pretty low percentage.
  • JavaScript and NodeJS are surprisingly low at just 4%.  Any idea why?  Is the JavaScript Kafka API not up to date or bad or ….?
  • The “Other” category is relatively big, at a bit over 12%.  Did we forget some major languages people often use with Kafka?  Scala?  See info about the Kafka Scala API here.

Everyone and their cousin is using Kafka nowadays, or at least that’s what it looks like from where we at Sematext sit.  However, because of the relatively high percentage of people using Python and Go, we’d venture to say Kafka adoption is much stronger among younger, smaller companies, where Python and Go are used more than “enterprise languages”, like Java, C#, and C/C++.