Registration is open - Live, Instructor-led Online Classes - Elasticsearch in March - Solr in April - OpenSearch in May. See all classes


Solr Digest, February 2010

This second installment of Solr Digest (see Solr January Digest) will cover 8 topics, some of which are quite new and some with very long history (and still uncertain future).

So, here we go:

  1. solr.ISOLatin1AccentFilterFactory is commonly used filter which replaces accented characters in ISO Latin 1 charset with their unaccented version (for instance, ‘à’ is replaced with ‘a’). However, the underlying Lucene filter ISOLatin1AccentFilter is already deprecated in favor of ASCIIFoldingFilter in Lucene 2.9 (BTW, Solr 1.4 release uses Lucene 2.9.1, while trunk with future Solr 1.5 uses Lucene 2.9.2) and has been deleted from Lucene 3.0. Of course, Solr already has a filter factory for the replacement, solr.ASCIIFoldingFilterFactory, so it would probably be wise to start using it in your Solr schemata, if you are still using the old ISOLating1AccentFilter. Functionality wise, there are no differences between these two filters, except that ASCIIFoldingFilter covers a superset of ISO Latin 1, meaning it converts everything ISOLatin1AccentFilter was converting and some more.

  2. DataImportHandler became multithreaded – after being filled with different functionalities, DataImportHandler got a performance boost. Your multicore servers will be happy to try it :). As part of JIRA issue SOLR-1352, the patch was created and committed to trunk, so you can expect this functionality in Solr 1.5 release, or you can already try it with one of  Solr 1.5 nightly builds.

  3. Script based UpdateRequestProcessorFactory – one very interesting feature still in development (JIRA issue SOLR-1725) is adding support for script based UpdateRequestProcessorFactory. It will depend on Java 6 script engine support (so Java 5 based Solr installation will not benefit here, although upgrade to Java 6 is definitely recommended) and be very easy to use. The scripts will have to be placed under SOLR_HOME/conf directory and their names will be defined in solrconfig.xml, like this:


<updateRequestProcessorChain name="script">
  <processor>
    <str name="scripts">updateProcessor.js</str>
    <lst name="params">
      <bool name="boolValue">true</bool>
      <int name="intValue">3</int>
    </lst>
  </processor>
</updateRequestProcessorChain>

Implementations would also be simple, here is example of updateProcessor.jsp  (copied from patch which brings this functionality):


function processAdd(cmd) {
  functionMessages.add("processAdd1");
}

function processDelete(cmd) {
  functionMessages.add("processDelete1");
}

function processMergeIndexes(cmd) {
  functionMessages.add("processMergeIndexes1");
}

function processCommit(cmd) {
  functionMessages.add("processCommit1");
}

function processRollback(cmd) {
  functionMessages.add("processRollback1");
}

function finish() {
  functionMessages.add("finish1");
}
  1. Similar to SolrJ API for communicating with Solr, there are numerous Solr clients for other languages, especially the dynamic scripting languages. As with all scripting languages, one of the main advantages over using pure Java is simplicity and development speed. You just write a few lines of code and immediately run the script — no need for compiling. At Sematext we find them especially handy when making changes to Solr installations, for quickly testing if Solr behaves as we expect.  One excellent solution for all Ruby lovers is RSolr.  Coincidentally, RSolr will be covered in the upcoming Solr in Action.

  2. Field Collapsing – this is a very frequently needed feature, but without satisfactory solution in Solr. There is a long history of this functionality in Solr, everything started while Solr was in version 1.3 with issue SOLR-236. It was never committed to svn, so you basically had to pick one of the many patches available in JIRA and apply it to your distribution. Since Solr was constantly developing, patches would pretty quickly become obsolete, so new versions would be created. Even when you found the correct patch for your Solr version, you would get occasional errors, so this surely wasn’t good enough for enterprise customers.

Recently, there have been renewed efforts invested into this issue and there are plans for this feature to finally be included in Solr 1.5. However, current implementation still isn’t good enough, there are OutOfMemory reports by some users, so it seems like we’ll have to wait some more to get enterprise quality “field collapsing” solution in Solr.

In light of problems with SOLR-236 solution, new JIRA issue SOLR-1773 was created. The goal of this issue is to provide “lightweight” implementation of this feature. There is already a patch containing this implementation and some measurements which show this approach has potential, but this still isn’t ready for serious deployments. The same approach is also implemented in SOLR-1682.

As you can see, work to provide field collapsing is underway, but we’re still some time away from committed code.

  1. SystemStatsRequestHandler – designed to provide statistics from stats.jsp to clients which access Solr with APIs like SolrJ or RSolr, it is being developed as JIRA issue SOLR-1750. It is destined to be included in Solr 1.5 version, but for now it is available as Java class attached to the issue. Before it is committed to svn, it might get another name.

  2. While Lucene just saw its 2.9.2 and 3.0.1 versions released, Solr trunk already has the latest Lucene 2.9.*, as you can see described in this thread.

  3. We’ve saved the best for last.  If you could have one feature in Solr… Check out this informative thread to see what people want from Solr that Solr doesn’t already have.  What do you want from Solr? Post your Solr desires in comments.

Start Free Trial