Solr vs. ElasticSearch: Part 6 – User & Dev Communities

One of the questions after my talk during the recent ApacheCon EU was what I thought about the communities of the two search engines I was comparing. Not surprisingly, this is also a question we often address in our consulting engagements.  As a part of our Apache Solr vs ElasticSearch post series we decided to step away from the technical aspects of SolrCloud vs. ElasticSearch and look at the communities gathered around thesee two projects. If you haven’t read the previous posts about Apache Solr vs. ElasticSearch here are pointers to all of them:

Users Community

Let’s start by discussing the user activity around both ElasticSearch and Apache Solr.

Users Activity

We started working on this post right before the Christmas break of 2012. During that time we decided to see how active the user base was for both ElasticSearch and Apache Solr. To do that we used our handy search-lucene.com service and we compared the number of email messages sent to both user list. So let’s see how they stack up.

Apache Solr

Solr User Mailing List Activity

As you can see, Solr user activity varies slightly from month to month which is perfectly understandable. Each bar on the chart represents two weeks. We can see the number of messages ranges from about 390 mails to about 770 per two weeks, which gives us between 800 to 1600 mails per month is we do a bit of rounding up. Quite impressive I must say!

ElasticSearch

ElasticSearch User Mailing List

Now let’s discuss the ElasticSearch side. First a few words of explanation. If you look at the above chart you might think that ElasticSearch mailing list was silent and then users started posting on October 2012. That’s clearly not true – it is just that we didn’t add ElasticSearch to search-lucene.com until recently.  However, you may see that the number of messages during the same period of time is quite similar – both Solr and ElasticSearch saw about 670 – 730 messages during a two weeks period. This gives us 2 emails per hour on average.

Distinct Users

Email volume is one thing, but I was always curious about how many different people write emails on the mailing lists. Having such number would give us an additional understanding of the structure of the community around a particular search engine, new users, etc. However, we should not look only at this number, but also on things like most active people on the mailing lists. In both cases we’ve looked at the same period from 1 to 30 December 2012. We’ve used the data we index for search-lucene.com to calculate these numbers.

Apache Solr

In case of Apache Solr there were 234 unique users sending mail to the users mailing list. Almost 8 unique users per day on average, nice :)

ElasticSearch

In case of ElasticSearch there were 271 unique users sending mail to the users mailing list. This gives us about 9 unique users per day on average which is even nicer.

Resources Available

As far as resources available, both ElasticSearch and Solr have great documentation. On Solr wiki site (http://wiki.apache.org/solr/) you can find information about most of the components and of course the tutorial for beginners. ElasticSearch is very similar, with tutorial and very good description of functionality available at http://www.elasticsearch.org/. In addition to that, there are three books published about Apache Solr (in English) and more (e.g. my Apache Solr 4 Cookbook) coming soon. As of now, there are no published books about ElasticSearch, but…. stay tuned :)

Search Trending

We also decided to use uncle Google to look at trends about Apache Solr and ElasticSearch. Let’s look at the following diagram:

solr-vs-es-google-trends

As you can see, until early 2010 there was no interest in ElasticSearch at all, at least looking from the point of view of users searching about it. Note that we published the interview with Shay Banon over two and a half years ago – back in May 2010 – before ElasticSearch registered on Google’s search trends radar! SolrCloud didn’t exist back then, so people slowly started looking for information on SolrCloud later in 2010.  The volume of searches mentioning SolrCloud is very small even today – perhaps because people tend to search for Solr and not SolrCloud.  And while SolrCloud is still a new kid around the block, searches for Solr dwarf searches for ElasticSearch despite the buzz surrounding ElasticSearch.

Of course, the above doesn’t say anything about the number of users of both search engines, but it definitely shows some information about the interest in these technologies.

Developers and the Code

If you are familiar with ElasticSearch and Solr you’ll probably know that ElasticSearch is much younger than Apache Solr. Apache Solr was created by Yonik Seeley in 2004 and donated to Apache Software Foundation. On the other hand, the first version of ElasticSearch was released by Shay Banon in 2010. This is quite important to say before we can talk about differences about contributors and the code itself. But getting to the point – we thought that it may be interesting to see both Apache Solr and ElasticSearch look from the Bird’s Eye perspective. To do that we’ve used the statistics and charts from ohloh.net. So, let’s see what they look like.

Apache Solr

Code Statistics

If we look at the current statistics, at the beginning of January 2013 Solr had more than 212k lines of code, with almost 7000 commits and 38 contributors. However, keep in mind that contributors are people that committed the code, not necessarily the ones that actually implemented it and provided the patch, so the actual number of contributors is much higher. The chart looks like this: lines_of_code_solr

Top Contributors

If we look at top contributors we see Mark Miller on top, followed by Yonik Seeley and Robert Muir in the third place :) top_commiters_solr

Active Contributors

One more interesting thing is the number of contributors that were actively involved during a given period of time. Looking at Apache Solr since 2006 we can see the following: active_commiters_solr I think that we can say that we had a stable growth of active contributors starting from 2006 until June 2012 with a bit of downfall shortly after that. However I don’t think that the number active contributors will be dropping, it’s more likely due to a bit of exhaustion of releasing Apache Lucene and Solr 4.0 :)

ElasticSearch

Code Statistics

Current code statistics for ElasticSeach shows that the code base just hit the 240k LOC  with about 4.2k commits and 87 contributors. lines_of_code_es

Top Contributors

As we’d expect, Shay Banon is the top contributor to ElasticSearch. In the second place on the podium we have Martijn van Groningen and Igor Motov in the third place: top_commiters_es

Active Contributors

And finally the active contributors. We don’t have the same time frame comparing to Apache Solr, which is understandable as ElasticSearch is younger, but still we can see what is happening. active_commiters_es As you can see from the first quarter of 2011 there was a number of active contributors varying from 5 to about 10 with the top at the same time as in Solr – 12 active contributors in June 2012.

Summary

As everything in this post indicates, both projects’ development and user communities are strong, active, and about equal. 2013 will be an interesting year for both projects.

We are nearing the end of our SolrCloud vs. ElasticSearch series.  What else would you like us to cover?  Please use the comments to let us know!

@kucrafal, @sematext

About Rafał Kuć
Sematext engineer, books author, speaker.

4 Responses to Solr vs. ElasticSearch: Part 6 – User & Dev Communities

  1. Hi Rafał, interesting review. If I look at ohloh.net for ES for the last 12 months of commits I see Shay has nearly 75% of all of them, the next as you state is Martijn at 9%, and for “All Time” I see it at nearly 90% (Shay and kimchy are the same, right?) Is this correct?

    • Rafał Kuć says:

      Seems to be correct and actually should be expected if you look at the history of ES. It is how ElasticSearch was developed until recently, but we can say that changing. Let’s see what the future brings :)

  2. andy says:

    Hi, great write up of the two search engines, Would be great to see some performance metrics of the two systems for indexing/querying using the same data sets & load conditions :)

    • sematext says:

      @Andy – that would indeed be very interesting. However, as we responded in a comment to one of the other posts in this series, good and accurate benchmarks as hard, dangerous, etc. so we haven’t done a direct Solr vs. ES performance comparison yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,672 other followers

%d bloggers like this: