Solr Performance Monitoring with SPM

Originally delivered as Lightning Talk at Lucene Eurocon 2011 in Barcelona, this quick presentation shows how to use Sematext’s SPM service, currently free to use for unlimited time, to monitor Solr, OS, JVM, and more.

We built SPM because we wanted to have a good and easy to use tool to help us with Solr performance tuning during engagements with our numerous Solr customers.  We hope you find our Scalable Performance Monitoring service useful!  Please let us know if you have any sort of feedback, from SPM functionality and usability to its speed.  Enjoy!

Extending Hadoop Metrics

Here at Sematext we really like performance metrics and we like HBase.  We like them so much we’ve created a service for HBase Performance Monitoring (and for Solr, too).  In the process we’ve done some experiments with Hadoop and HBase around performance monitoring and are sharing our experience and some relevant code in this post.

The Hadoop metrics framework is simple to extend and customise. For example, you can very easily write a custom MetricsContext which sends metrics to your own storage solution.

All you need to do is extend the AbstractMetricsContext class and implement

protected void emitRecord(String context, String record, OutputRecord outputrecord)
  throws IOException;

To demonstrate, I wrote HBaseMetricsContext which stores Hadoop metrics in HBase. Since HBase itself uses the Hadoop metrics framework, you can use it to store its own metrics inside itself. Useful? Maybe. This is just an example after all.

If you’d like to try it out, get the source from GitHub. Then build the project using:

mvn package

Put the resulting Jar file in the HBase lib directory.

You will need to create a table with the relevant column families. We assume the column families are a composite of:

columnFamily = contextName + "." + recordName

In the HBase shell create your table:

create 'metrics', 'hbase.master', 'hbase.regionserver'

Edit your hadoop-metrics.properties file to include:

hbase.class=com.sematext.hadoop.metrics.HBaseMetricsContext
hbase.tableName=metrics
hbase.period=10

Restart HBase and it will start inserting to the metrics table every 10 seconds.

The row key of each record is made up of the timestamp and the tags (for disambiguation) like so:

rowKey = bytes(maxlong - timestamp) + bytes(tagName) + bytes(tagValue) + …

Subtracting the timestamp from maxlong ensures the scans get the most recent record first.

Each tag and metric is stored in it’s own column. This gives us a table that looks something like this:

hbase.master hbase.regionserver
cluster_requests hostName hostName flushQueueSize regions
rowKey2 rs1.example.org 0 1
rowKey1 101 master.example.org

For clarity timestamps are not included in the above table, as each cell is timestamped. All cells for a record will have the same timestamp.

Training: Solr Performance Tuning and Monitoring

Quick announcement!

In addition to presenting at Open Source Search Conference in June, we’ll also be doing a super-cheap half-day training on Solr Performance Tuning & Monitoring.  You can sign up here.

In this tutorial you will learn how to squeeze the most performance out of your Solr cluster. We’ll cover performance at both indexing and query time; dealing with large volumes of data versus high query rates, the combination of the two; and various index sharding architectures possible to gain on search performance, in multi-data center setups, etc. We’ll cover an array of best practices, tips and tricks we regularly use in our engagements with clients, from various configuration settings to querying efficiently, all of which one should employ to get the most out of Solr. You will also learn how to monitor your Solr cluster’s performance with command-line tools and a visual monitoring solution specifically designed for Solr performance monitoring.

Prerequisites:

Basic knowledge of Solr, its configuration and setup.

Details:

  • Cost: $100
  • When: June 14, 2011, 9:00 a.m.-1:00 p.m.
  • Bonus: Lunch will be provided.
  • Register here

If you are interested in Solr Performance Monitoring, please read about Sematext Scalable Performance Monitoring service.

Solr Performance Monitoring Announcement

Update: Sematext now offers SPM – Scalable Performance Monitoring for Solr (as well as for HBase, OS, JVM, etc.). See our Solr Performance Monitoring with SPM blog post.

We’re happy to announce the partnership between Sematext and New Relic.  Over the last few months we have been using New Relic’s RPM Gold Plan for monitoring performance of our own Solr-based services for searching Lucene and Hadoop ecosystems: http://search-lucene.com/ and http://search-hadoop.com/. We found it valuable for understanding (and fixing!) Solr performance bottlenecks and are going to be offering this service to our new and old customers.

We’ll also set up our Lucene/Solr tech support subscribers with New Relic’s RPM.  This will make our tech support team more visibility in customer’s past Solr performance, spot any suspicious trends or errors, and quickly understand the overall performance trends over time, thus resolving our customer’s issues much more quickly.

You can get more details about the service, and you can get a free, no strings attached 30 days Gold plan trial here.

Note: you will NOT be asked for any credit card information and will not have to pay a thing, nor will you have to cancel anything in order to avoid having to pay for the service.  Signing up via the above link gives you 30 days free Gold plan trial.  After 30 days the plan just goes back to the free Lite plan.  If you need more than 30 days to evaluate the service, please let us know.  Free, no strings attached 30 days Gold plan trial sign up is here.  Enjoy!

Follow

Get every new post delivered to your Inbox.

Join 599 other followers