Lucene Digest, January 2010

In this debut Lucene Digest post we’ll cover some of the recent interesting happenings in Lucene-land.

  • For anyone who missed it, Lucene 3.0.0 is out (November 2009).  The difference between 3.0.0 and the previous release is that 3.0.0 has no deprecated code (big cleanup job) and that the support for Java 1.4 was finally dropped!
  • As of this writing, there is only 1 JIRA issue targeted for 3.0.1 and 104 targeted for 3.1 Lucene release.
  • Luke goes hand in hand with Lucene, and Andrzej quickly released Luke that uses the Lucene 3.0.0 jars.  There were only minor changes in the code (and none in functionality) related to the changes in API between Lucene 2.9.1 and 3.0.
  • As usual, the development is happening on Lucene’s trunk, but there is now also a “flex branch” in svn, where features related to Flexible Indexing (not quite up to date) are happening.  Lucene’s JIRA also has a Flex Branch “version”, and as of this writing there are 6 big issues being worked on there.
  • The new Lucene Connectors Framework (or LCF for short) subproject is in the works.  As you can guess from the name, LCF aims to provide a set of connectors to various content repositories, such as relational databases, SharePoint, ECM Documentum, File System, Windows File Shares, various Content Management Systems, even Feeds and Web sites.  LCF is based on code donation from MetaCarta and will be going through the ASF Incubator. We expect it to graduate from the Incubator into a regular Lucene TLP subproject later this year.
  • Spatial search is hot, or at least that’s what it looks like from inside Lucene/Solr.  Both projects’ developers and contributors are busy adding support for spatial/geo search.  Just in Lucene alone, there are currently 21 open JIRA issues aimed at spatial search.  Lucene’s contrib/spatial is where spatial search lives.  We’ll cover Solr’s support for spatial search in a future post.
  • Robert Muir and friends have added some final-state automata technology to Lucene and dramatically improved regular expression and wildcard query performance.  Fuzzy queries may benefit from this in the future, too.  The ever industrious Robert covered this in a post with more details.

These are just some highlights from Luceneland.  More Lucene goodness coming soon.  Please use comments to tell us whether you find posts like this one useful or whether you’d prefer a different angle, format, or something else.

6 Responses to Lucene Digest, January 2010

  1. Lukas Vlcek says:

    Otis, this is definitely useful!

  2. Ron says:

    Great post and informative too.

  3. Pingback: Solr Digest, January 2010 « Sematext Blog

  4. Pingback: HBase Digest, January 2010 « Sematext Blog

  5. Pingback: Mahout Digest, February 2010 « Sematext Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,668 other followers

%d bloggers like this: