Solr Digest, May 2010

May’s Solr Digest brings another review of interesting Solr developments and a short look at current state of Solr’s branches and versions. Confused about which versions to use and which to avoid? Don’t worry, many people are.  We’ll try to clear it up in this Digest.

  • In April’s edition of Solr Digest, we mentioned two JIRA issues dealing with document level security (SOLR-1834 and SOLR-1872). Now another issue (SOLR-1895) deals with LCF security and aims to work with SOLR-1834 to provide a complete solution.
  • One ancient JIRA issue, SOLR-397, finally got resolved and its code is now committed. Solr now has the ability to control how date range end points in facets are dealt with.  You can use this functionality by specifying the facet.date.include HTTP request parameter, which can have values “all”, “lower“, “upper“, “edge“, “between“, “before“, or “after“. More details about this can be found in SOLR-397.
  • Another issue related to date ranges was created.  This one aims to add Date Range QParser to Solr, which would simplify definition of date range filters resulting from date faceting.   This issue is still in its infancy and has no patches attached to it as of this writing, but it looks useful and promising.  When we add date faceting to Search-Lucene.com and Search-Hadoop.com we’ll be looking to use this date range query parser.
  • Some errors in Solr will be much easier to trace after JIRA issue SOLR-1898 gets resolved. Everyone using Solr probably encountered exceptions like: java.lang.NumberFormatException: For input string: “0.0”.  The message itself lacks some crucial details, such as information about the document and field that triggered the exception.  SOLR-1898 will solve that problem, and we are looking forward to this patch!
  • Have you recently been in the situation where you were unsure about which branch or version of Solr you should use on your projects? If yes, you’re certainly not alone! After the recent merge of Solr and Lucene (covered in Solr March Digest and Lucene March Digest), things became confusing, especially for casual observers of Lucene and Solr. Here are some facts about the current state of Solr:
  1. latest stable release version of Solr is still 1.4
  2. 1.4 version was released more than 6 months ago, so many new features, patches and bug fixes aren’t included in it
  3. however, it was a stable release, so if you’re planning your production very soon, maybe one low-risk choice would be using 1.4 version on top of which you could apply the patches that you find necessary for you deployment
  4. current development is ongoing on trunk (considered as unstable version and slated for future Solr 4.0 version) and branch named branch_3x. This branch is the most likely candidate for the next version of Solr (named 3.1) and is considered as (stable) development version which could be usable, though you have to be careful with your testing, as always.
  5. another choice could be some old 1.5 nightly build, but 1.5 is abandoned and, in our opinion, it makes more sense to use nightly builds from branch_3x

Here are couple of threads where you can get more information:

  1. Lucene 3.x branch created
  2. Which Solr to use?
  • To show one of the dangers of unstable versions, we’ll immediately point to one recently open JIRA issue related to “file descriptor leak” while indexing.
  • Although at Sematext we’ve been using Java 6 for a very long time both for our products and with our clients, some people might still be stuck with Java 5. It appears that they will never be able to use Solr 4.0 once it is released, since Solr trunk version now requires Java 6 to compile.

We’ll finish this month’s Solr Digest with two new Solr features:

  • For anyone wanting to use JSON format when sending documents to Sorl, JSON update handler is now committed to trunk
  • on the other hand, if you need CSV as output format from Solr, you might benefit from the work on new CSV Response Writer. Currently, there are no patches with it, but you can watch the issue and see when it is added.

Thanks for reading another Solr Digest!  Help us spread the word, please Re-Tweet it, and follow @sematext on Twitter.

2 Responses to Solr Digest, May 2010

  1. Pingback: Solr Digest, June 2010 « Sematext Blog

  2. Pingback: Solr Digest, July 2010 « Sematext Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,695 other followers

%d bloggers like this: