Introducing Top Database Operations

If you run Elasticsearch, Solr, or any datastore you connect to via JDBC, you’ll like what we’ve just added to SPM.  We call it Database Operations and in SPM you can find it in the new Database report:

If you didn’t watch the video, here’s what Database Operations gives you:

  • Top 5 operation types across all your data stores or filtered to a specific data store type
  • Top 5 operation types by speed, throughput, or simply their volume
  • Time-series reports for volume, throughput, and latency broken down by operation type
  • Ability to view all collected operations, not just the slowest ones, filter by database type or by operation type, sorted by average or total duration, or throughput
  • Sparklines that show last 5 minute values and trends
  • Top 10 slowest individual operations and drill-in details

Integration with Transaction Tracing, so you can correlate slow data store operations with the actual transaction/request that triggered slow operations


  • To get this information add SPM agent to the application that is talking to a data store (e.g. Solr or Elasticsearch). This is because the SPM agent captures operations at that client layer, not in the server itself.
  • To start capturing this information enable Transaction Tracing in your SPM agents

This, including Distributed Transaction Tracing, works for all Java applications




Don’t forget – when you enable Database Operations you will also automatically get Transaction Tracing, as well as the cool AppMaps – enjoy! :)

Got ideas how we could make Database Operations better and more useful to you?  Let us know via comments, email or @sematext.

Grab a free 30-day SPM trial by registering here (ping us if you’re a startup, a non-profit, or educational institution – we’ve got special pricing for you!).  There’s no commitment and no credit card required.

SolrCloud: Dealing with Large Tenants and Routing

Many Solr users need to handle multi-tenant data. There are different techniques that deal with this situation: some good, some not-so-good. Using routing to handle such data is one of the solutions, and it allows one to efficiently divide the clients and put them into dedicated shards while still using all the goodness of SolrCloud. In this blog post I will show you how to deal with some of the problems that come up with this solution: the different number of documents in shards and the uneven load.

Imagine that your Solr instance indexes your clients’ data. It’s a good bet that not every client has the same amount of data, as there are smaller and larger clients. Because of this it is not easy to find the perfect solution that will work for everyone. However, I can tell one you thing: it is usually best to avoid per/tenant collection creation. Having hundreds or thousands of collections inside a single SolrCloud cluster will most likely cause maintenance headaches and can stress the SolrCloud and ZooKeeper nodes that work together. Let’s assume that we would rather go to the other side of the fence and use a single large collection with many shards for all the data we have.

No Routing At All

The simplest solution that we can go for is no routing at all. In such cases the data that we index will likely end up in all the shards, so the indexing load will be evenly spread across the cluster:

Multitenancy - no routing (index)

However, when having a large number of shards the queries end up hitting all the shards:

Multitenancy - no routing

This may be problematic, especially when dealing with a large number of queries and a large number of shards together. In such cases Solr will have to aggregate results from the large number of shards, which can take time and be performance expensive. In these situations routing may be the best solution, so let’s see what that brings us. Read more of this post

SolrCloud Rebalance API

This is a post of the work done at BloomReach on smarter index & data management in SolrCloud.  

Authors: Nitin Sharma – Search Platform Engineer & Suruchi Shah –  Engineering Intern




In a multi-tenant search architecture, as the size of data grows, the manual management of collections, ranking/search configurations becomes non-trivial and cumbersome. This blog describes an innovative approach we implemented at BloomReach that helps with an effective index and a dynamic config management system for massive multi-tenant search infrastructure in SolrCloud.


The inability to have granular control over index and config management for Solr collections introduces complexities in geographically spanned, massive multi-tenant architectures. Some common scenarios, involving adding and removing nodes, growing collections and their configs, make cluster management a significant challenge. Currently, Solr doesn’t offer a scaling framework to enable any of these operations. Although there are some basic Solr APIs to do trivial core manipulation, they don’t satisfy the scaling requirements at BloomReach.

Innovative Data Management in SolrCloud

To address the scaling and index management issues, we have designed and implemented the Rebalance API, as shown in Figure 1. This API allows robust index and config manipulation in SolrCloud, while guaranteeing zero downtime using various scaling and allocation strategies. It has  two dimensions:


The seven scaling strategies are as follows:

  1. Auto Shard allows re-sharding an entire collection to any number of destination shards. The process includes re-distributing the index and configs consistently across the new shards, while avoiding any heavy re-indexing processes.  It also offers the following flavors:
    • Flip Alias Flag controls whether or not the alias name of a collection (if it already had an alias) should automatically switch to the new collection.
    • Size-based sharding allows the user to specify the desired size of the destination shards for the collection. As a result, the system defines the final number of shards depending on the total index size.
  2. Redistribute enables distribution of cores/replicas across unused nodes. Oftentimes, the cores are concentrated within a few nodes. Redistribute allows load sharing by balancing the replicas across all nodes.
  3. Replace allows migrating all the cores from a source node to a destination node. It is useful in cases requiring replacement of an entire node.
  4. Scale Up adds new replicas for a shard. The default allocation strategy for scaling up is unused nodes. Scale up also has the ability to replicate additional custom per-merchant configs in addition to the index replication (as an extension to the existing replication handler, which only syncs the index files)
  5. Scale Down removes the given number of replicas from a shard.
  6. Remove Dead Nodes is an extension of Scale Down, which allows removal of the replicas/shards from dead nodes for a given collection. In the process, the logic unregisters the replicas from Zookeeper. This in-turn saves a lot of back-and-forth communication between Solr and Zookeeper in their constant attempt to find the replicas on dead nodes.
  7. Discovery-based Redistribution allows distribution of all collections as new nodes are introduced into a cluster. Currently, when a node is added to a cluster, no operations take place by default. With redistribution, we introduce the ability to rearrange the existing collections across all the nodes evenly.

Read more of this post

Solr Presentations from Lucene/Solr Revolution 2014

Thanks to everyone who stopped by the Sematext booth at last week’s Lucene/Solr Revolution event in Washington, DC and attended our two talks:

The attendance, questions and interest are very much appreciated.  As a company that prides itself on its Solr expertise (and Elasticsearch expertise too, for that matter), it was nice to spend a couple days talking about search and Big Data challenges, performance monitoring and logging with fellow experts from around the world. Here are the slides for the two talks we gave (summaries of the talks can be found here):


  Videos of the talks will be posted here soon.  Hope to see everyone again next year!

Podcast: Tools to Monitor Solr, Manage Logs & Analyze Search Trends

Sematext Founder & President Otis Gospodnetic recently spoke with LucidWorks Chief of Product, Will Hayes as part of their SolrCluster podcast series.  Otis and Will discussed tools that Sematext has built to help monitor Solr and other stacks, manage and analyze logs, and analyze search trends.  They also discuss Solr/SolrCloud and Elasticsearch, their APIs, developer friendliness, as well as the general direction that search and big data industry leaders are moving toward around data acquisition and discovery as data increasingly grows.

Go here to listen to the podcast.  It runs about 36 minutes.  Enjoy!

Video: Scaling Solr with SolrCloud

During last  year’s Lucene Revolution conference in Dublin we had the opportunity to give four talks, one of which was Scaling Solr with SolrCloud. Through it we wanted to share our experiences around scaling Solr, especially as we have experience in running Solr internally and as a team of search consultants.  Enjoy the video and/or the slides!

Note: we are looking for engineers passionate about search to join our professional services team.  We’re hiring planet-wide!

Video: Administering and Monitoring SolrCloud Clusters

As you know, at Sematext, we are not only about consulting services, but also about administration, monitoring, and data analysis. Because of that, during last year’s Lucene Revolution conference in Dublin we gave a talk about administration and monitoring of SolrCloud clusters. During the talk, Rafał Kuć discusses some administration procedures for SolrCloud like collection management and schema modifications with the schema API. In addition, he also talks about why monitoring is important and what to pay attention to. Finally, he shows three real life examples of monitoring usefulnesses.  Enjoy the video and/or the slides!

Note: we are looking for engineers passionate about search to join our professional services team.  We’re hiring planet-wide!





Get every new post delivered to your Inbox.

Join 181 other followers