Kafka Poll: Version You Use?

With Kafka 0.8.2 and being released and with the updated SPM for Kafka monitoring over 100 Kafka metrics, we thought it would be good to see which Kafka versions are being used in the wild.  Kafka 0.7.x was a strong and stable release used by many.  The 0.8.1.x release has been out since March 2014.  Kafka 0.8.2.x has been out for just a little while, but…. are there any people who are either already using it (we are!) or are about to upgrade to it? Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

Hiring: Full-stack Java Developers

Sematext is looking for a strong full-stack developers (remote work is cool!) who:

  • Find creative and elegant solutions, build tools, avoid repetition and boilerplate code
  • Take ownership and push forward; want to help build the team and the organization
  • Like working with data-intense applications, continuous data streams (e.g. metrics, logs, events), visualization and data analytics
  • Want to have fun, enjoy building new things and improving existing ones

Some info about our tech:

  • Java in the backend, with a bit of Akka
  • Java and NodeJS for various SPM agents
  • A series of Machine Learning algorithms for Anomaly Detection
  • HBase and Elasticsearch for storing massive volumes of data (many billions of “rows”… stopped counting long ago)
  • Jetty and Kafka that handle hundreds of thousands of events/metrics/logs/messages per second
  • MySQL, ZooKeeper (obviously)
  • Apache Flume, rsyslog, Logstash, and Kibana
  • Lots of JavaScript in the Boostrap-based UI layer – jQuery and various other usual suspects
  • Flot for charting (and looking to replace that with something more modern, yes)
  • Solr, but just for search-lucene.com and search-hadoop.com, which are not related to this opening
  • Everything runs on AWS – we own a total of 2 physical servers

Products / Services you’d be building:

  • SPM – monitoring, alerting, anomaly detection.  A lot has been done, and a lot more is in the queue.
  • Logsene – log collection, indexing, searching, alerting, anomaly detection.  A lot of new features are waiting to be built.
  • Search Analytics – it works, it runs, it has customers, but there is so much more value we can extract from query and click data!
  • NewAppHere – can’t talk about it, but it’s going to be big… and not only in Japan

Good to have skills:

  • Java – for various backend components
  • JavaScript (frameworks/libraries) – if you’re truly full-stack
  • We can teach you or at least help you catch up quickly with everything else mentioned above as well as custom bits not mentioned here

A bit about Sematext:

  • HQ in NYC, with people in North America, Europe, and Asia
  • Developers with strong open-source backgrounds
  • Deep expertise in Solr and Elasticsearch – we are also a leading provider of consulting and support for tons of clients
  • Some of our engineers give talks at conferences around the world and write books
  • We are totally self-funded, financially independent, and profitable
  • Chirping via @sematext

Got more questions?  Send them our way!  Better yet, send your resume!

Poll Results: Kafka Producer/Consumer

About 10 days ago we ran a a poll about which languages/APIs people use when writing their Apache Kafka Producers and Consumers.  See Kafka Poll: Producer & Consumer Client.  We collected 130 votes so far.  The results were actually somewhat surprising!  Let’s share the numbers first!

Kafka Producer/Consumer Languages

Kafka Producer/Consumer Languages

What do you think?  Is that the breakdown you expected?  Here is what surprised us:

  • Java is the dominant language on the planet today, but less than 50% people use it with Kafka! Read: possible explanation for Java & Kafka.
  • Python is clearly popular and gaining in popularity, but at 13% it looks like it’s extra popular in Kafka context.
  • Go at 10-11% seems quite popular for a relatively young language.  One might expect Ruby to have more adoption here than Go because Ruby has been around much longer.
  • We put C/C++ in the poll because these languages are still in use, though we didn’t expect it to get 6% of votes.  However, considering C/C++ are still quite heavily used generally speaking, that’s actually a pretty low percentage.
  • JavaScript and NodeJS are surprisingly low at just 4%.  Any idea why?  Is the JavaScript Kafka API not up to date or bad or ….?
  • The “Other” category is relatively big, at a bit over 12%.  Did we forget some major languages people often use with Kafka?  Scala?  See info about the Kafka Scala API here.

Everyone and their cousin is using Kafka nowadays, or at least that’s what it looks like from where we at Sematext sit.  However, because of the relatively high percentage of people using Python and Go, we’d venture to say Kafka adoption is much stronger among younger, smaller companies, where Python and Go are used more than “enterprise languages”, like Java, C#, and C/C++.

Kafka Poll: Producer & Consumer Client

Kafka has become the de-facto standard for handling real-time streams in high-volume, data-intensive applications, and there are certainly a lot of those out there.  We thought it would be valuable to conduct a quick poll to find out which which implementation of Kafka Producers and Consumers people use – specifically, which programming languages do you use to produce and consume Kafka messages?

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results here and via @sematext (follow us!) in a week.

NOTE #: If you choose “Other”, please leave a comment with additional info, so we can share this when we publish the results, too!

NOTE #2: The results are in! See http://blog.sematext.com/2015/01/28/kafka-poll-results-producer-consumer/

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.  We’ll publish the results hereand via @sematext (follow us!) in a week.

Top 5 Most Popular Log Shippers

The Log Shipper Poll results are in!  We run Logsene here at Sematext, so we wanted to know what people like to use to ship their logs.  Before we share the results, a few words about the poll:

  • We published it here on our blog on September 22, 2014
  • We automatically tweeted it and posted it to several Devops and similar LinkedIn groups
  • We did not post it to groups or mailing lists for various log shippers we included in the poll to avoid bias
  • We collected 115 votes until now

That said, let’s see how log shipper popularity breaks down.

You can tweet the results of this poll here: Top 5 Most Popular Log Shippers

Log Shipper Popularity

Log Shipper Popularity

Don’t forget to check out Logsene – our Log Management Cloud/On Premises service that will happily take logs from Logstash, Flume, rsyslog, Fluentd, Syslog-ng, syslogd, etc.  Check How to Send Logs to Logsene to see how easy it is.

Job: Sematext is hiring – Elasticsearch Engineer

The Sematext team is more distributed than your average Elasticsearch cluster and, trust me, we’ve seen a a good portion of the world’s Elasticsearch clusters.  The thing with Elasticsearch clusters is they often get new nodes added and they keep expanding to handle more data and more queries.  Similarly, we are looking to add a new node to the Sematext team so we can reshard our work a bit, distribute it more evenly, and scale further.  In plain English, we are looking for an Engineer who loves working with Elasticsearch, who loves large volumes of data, and a wide variety of projects and challenges involving large scale data processing, high volume indexing, high query rates, who likes working with our clients, and wants to make Logsene and SPM the killer log management and monitoring platforms.  Advanced knowledge of Elasticsearch is less important than passion to learn and build, positive attitude, ability to make decisions, work both independently and with the rest of the team, communicate well, and simply be a good person.  We can teach you everything about Elasticsearch and turn you into a bonsai tree loving Elasticsearch samurai, but we need you to be all these other things.

As a member of our team you will get to:

  • Work with world-class search experts
  • Design and implement systems (both our own and our clients’) that process 10s of thousands of queries per second and handle billions of documents, logs, data points, etc.
  • Interact with clients and customers world-wide
  • Provide guidance, architecture design, implementation, and production support around Elasticsearch
  • Participate in and contribute to open-source (we’ve contributed to Solr, Lucene, HBase, Flume, rsyslog, Logstash, etc.)
  • Share your knowledge with clients, at conferences and under-conferences, online community, etc.

This position:

  • Offers a lot of independence, learning, and growth
  • Is open to applicants “west of New York City” (this could be South, Central, or North America, of course), though we’ll happily make an exception if you persuade us we should make an exception for you!

Our search team members have written several books about search, regularly give talks at conferences, blog, and participate in open-source projects.  For more info, see 19 things you may like about Sematext.

Interested? Please send your resume to jobs@sematext.com.

For other job openings please see Jobs @ Sematext or even our previous job listings.


Get every new post delivered to your Inbox.

Join 152 other followers