SolrCloud Rebalance API

This is a post of the work done at BloomReach on smarter index & data management in SolrCloud.  

Authors: Nitin Sharma – Search Platform Engineer & Suruchi Shah –  Engineering Intern

 

Nitin_intro

Introduction

In a multi-tenant search architecture, as the size of data grows, the manual management of collections, ranking/search configurations becomes non-trivial and cumbersome. This blog describes an innovative approach we implemented at BloomReach that helps with an effective index and a dynamic config management system for massive multi-tenant search infrastructure in SolrCloud.

Problem

The inability to have granular control over index and config management for Solr collections introduces complexities in geographically spanned, massive multi-tenant architectures. Some common scenarios, involving adding and removing nodes, growing collections and their configs, make cluster management a significant challenge. Currently, Solr doesn’t offer a scaling framework to enable any of these operations. Although there are some basic Solr APIs to do trivial core manipulation, they don’t satisfy the scaling requirements at BloomReach.

Innovative Data Management in SolrCloud

To address the scaling and index management issues, we have designed and implemented the Rebalance API, as shown in Figure 1. This API allows robust index and config manipulation in SolrCloud, while guaranteeing zero downtime using various scaling and allocation strategies. It has  two dimensions:

Nitin_strategy

The seven scaling strategies are as follows:

  1. Auto Shard allows re-sharding an entire collection to any number of destination shards. The process includes re-distributing the index and configs consistently across the new shards, while avoiding any heavy re-indexing processes.  It also offers the following flavors:
    • Flip Alias Flag controls whether or not the alias name of a collection (if it already had an alias) should automatically switch to the new collection.
    • Size-based sharding allows the user to specify the desired size of the destination shards for the collection. As a result, the system defines the final number of shards depending on the total index size.
  2. Redistribute enables distribution of cores/replicas across unused nodes. Oftentimes, the cores are concentrated within a few nodes. Redistribute allows load sharing by balancing the replicas across all nodes.
  3. Replace allows migrating all the cores from a source node to a destination node. It is useful in cases requiring replacement of an entire node.
  4. Scale Up adds new replicas for a shard. The default allocation strategy for scaling up is unused nodes. Scale up also has the ability to replicate additional custom per-merchant configs in addition to the index replication (as an extension to the existing replication handler, which only syncs the index files)
  5. Scale Down removes the given number of replicas from a shard.
  6. Remove Dead Nodes is an extension of Scale Down, which allows removal of the replicas/shards from dead nodes for a given collection. In the process, the logic unregisters the replicas from Zookeeper. This in-turn saves a lot of back-and-forth communication between Solr and Zookeeper in their constant attempt to find the replicas on dead nodes.
  7. Discovery-based Redistribution allows distribution of all collections as new nodes are introduced into a cluster. Currently, when a node is added to a cluster, no operations take place by default. With redistribution, we introduce the ability to rearrange the existing collections across all the nodes evenly.

Read more of this post

Introducing Akka Monitoring

Akka is a toolkit and runtime for building highly concurrent, distributed and resilient message-driven applications on the JVM. It’s a part of Scala’s standard distribution for the implementation of the “actor model”.

How Akka Works

Messages between Actors are exchanged in Mailbox queues and Dispatcher provides various concurrency models, while Routers manage the message flow between Actors. That’s quite a lot Akka is doing for developers!

But how does one find bottlenecks in distributed Akka applications? Well, many Akka users already use the great Kamon Open-Source Monitoring Tool, which makes it easy to collect Akka metrics.  However — and this is important! — predefined visualizations, dashboards, anomaly detection, alerts and role-based access controls for the DevOps team are out of scope for Kamon, which is focused on metrics collection only.  To overcome this challenge, Kamon’s design makes it possible to integrate Kamon with other monitoring tools.

Needless to say, Sematext has embraced this philosophy and contributed the Kamon backend to SPM.  This gives Akka users the option to use detailed Metrics from Kamon along with the visualization, alerting, anomaly detection, and team collaboration functionalities offered by SPM.

The latest release of Kamon 0.5.x includes kamon-spm module and was announced on August 17th, 2015 on the Kamon blog.  Here’s an excerpt:

Pavel Zalunin from Sematext contributed the new kamon-spm module, which as you might guess allows you to push metrics data to the Sematext Performance Monitor platform. This contribution is particularly special to us, given the fact that this is the first time that a commercial entity in the performance monitoring sector takes the first step to integrate with Kamon, and they did it so cleanly that we didn’t even have to ask any changes to the PR, it was just perfect. We sincerely hope that more companies follow the steps of Sematext in this matter.

Now let’s take a look at the result of this integration work:

  • Metrics pushed to SPM are displayed in predefined reports, including:
    • An overview of all key Akka metrics
    • Metrics for Actors, Dispatchers and Routers
    • Common metrics for CPU, Memory, Network, I/O,  JVM and Garbage Collection
  • Each chart has the “Action” menu to:
    • Define criteria for anomaly detection and alerts
    • Create scheduled email reports
    • Securely share charts with read-only links
    • Embed charts into custom dashboards
  • A single SPM App can take metrics from multiple hosts to monitor a whole cluster; filters by Host, Actor, Dispatcher, and Router make it easy to drill down to the relevant piece of information.
  • All other SPM features, are available for Akka users, too.  For example:

Akka_overview

Akka Metrics Overview

Actor Metrics

Actors send and receive messages, therefore the key metrics for Actors are:

  • Time in Mailbox
    Messages are waiting to be processed in the Mailbox – high Time in Mailbox values indicate potential delays in processing.
  • Processing Time
    This is the time Actors need to process the received messages – use this to discover slow Actors
  • Mailbox Size
    Large Mailbox Size could indicate pending operations, e.g. when it is constantly growing.

Each of the above metrics is presented in aggregate for all Actors, but one can also use SPM filtering feature to view all Actors’ metrics separately or select one or more specific Actors and visualize only their metrics.  Filtering by Host is also possible, as show below.

Akka_actors

Akka Actors

Dispatcher Metrics

In Akka a Dispatcher is what makes Actors ‘tick’. Each Actor is associated with a particular Dispatcher (default one is used if no explicit Dispatcher is set). Each Dispatcher is associated with a particular Executor – Thread Pool or Fork Join Pool. The SPM Dispatcher report shows information about Executors:

  • Fork Join Pool
  • Thread Pool Executor

All metrics can be filtered by Host and Dispatcher.

Akka_dispatchers

Akka Dispatchers

Router Metrics

Routers can be used to efficiently route messages to destination Actors, called Routees.

  • Routing Time – Time to route message to selected destination
  • Time In Mailbox – Time spent in routees mailbox.
  • Processing Time – Time spent by routee actor to process routed messages
  • Errors Count – Errors count during processing messages by routee

For all these metrics, lower values are better, of course.

Akka_routers

Akka Routers

You can set Alerts and enable Anomaly Detection for any Akka or OS metrics you see in SPM and you can create custom Dashboards with any combination of charts, whether from your Akka apps or other apps monitored by SPM.

We hope you like this new addition to SPM.  Got ideas how we could make it more useful for you?  Let us know via comments, email, or @sematext.

Not using SPM yet? Check out the free 30-day SPM trial by registering here (ping us if you’re a startup, a non-profit, or education institution – we’ve got special pricing for you!).  There’s no commitment and no credit card required.  SPM monitors a ton of applications, like Elasticsearch, Solr, Cassandra, Hadoop, Spark, Node.js (open-source), Docker (get open-source Docker image), CoreOS, RancherOS and more.

Top 10 Mistakes Made While Learning Solr

Top_10_Solr

  1. Upgrading to the new major version right after its release without waiting for the inevitable .1 release
  2. Explaining your, “I don’t need backups, I can always reindex” statement to your manager during an 8-hour reindexing session
  3. Taking down the whole Data Center with a single rows=1000000000000000 request while singing, “I want it all / I want it now”
  4. In a room full of Solr users wondering out loud why you’re not using Elasticsearch instead
  5. Splitting shards like it’s 1999
  6. Giving Solr’s JVM all the memory you’ve got and getting paged in the middle of the night
  7. Running hundreds of queries with facet.mincount=0 and facet.limit=-1 and wondering why the YouTube videos you’re trying to watch are being buffered
  8. Using shards=1 and replicationFactor=1 and wondering why only a single node in your hundred nodes cluster is being used
  9. Optimizing after commits, hard committing every 5 seconds, using openSearcher=true and still wondering why your terminal is all slow
  10. …and last but not least: not taking Sematext Solr guru @kucrafal’s upcoming Solr Training course in October in NYC!

Solr_Training

Innovative Docker Log Management

[Note: We’re holding webinars on Docker Monitoring on September 14 and 15 and Docker Logging webinars on September 29 and 30.]

In the dynamic world of “Microservices” the traditional method of static setups for log routing and parsing doesn’t work very well; in fact, it creates additional complexity and resource usage.  This, in turn, reduces the number of microservices that could run on a single server.  Sematext has come up with a better method.

The integrated log management functions in SPM for Docker support the microservice approach by reducing setup complexity, startup time and minimizing the utilized resources. SPM Agent for Docker collects metrics, events and logs and runs in a container on every Docker Host. In addition to standard log collection functionality, we recently integrated the automatic log format detection and field extraction for Container Log Messages. The processing is hooked into the stream from the Docker API where logs are collected to the log indexing service of our centralized logging tool, Logsene.  This means that — and if you’ve dealt with logs before you’ll know this is huuuge — there’s no set-up of syslog with Docker log drivers, there is no routing to a heavy Logstash process for parsing, and there is no maintenance of Elasticsearch to keep the logs, or even a need to run your own Kibana! SPM for Docker and Logsene provides everything out of the box!

There are many ways to collect logs from Docker (you can learn about that in our Docker Logging Webinar); so what is the advantage of using Logsene for Docker Logs? Let me show you…

Read more of this post

Webinar: Docker Logging

If you use Docker you know that these deployments can be very dynamic, not to mention all the ways there are to monitor Docker containers and their hosts, collect logs from them, etc. etc.  And if you didn’t know these things, well, you’ve come to the right place!

With this pair of identical webinars, we’re going to focus on Docker logging.  Specifically — the different log collection options Docker users have, the pros and cons of each, specific and existing Docker logging solutions, tooling, the role of syslog, log shipping to ELK Stack, and more.  Docker, and with it projects like CoreOS and RancherOS, are growing rapidly, and here at Sematext we’re at the front of the bandwagon when it comes to adding support for Docker monitoring and logging, along with Docker event collection, charting, and correlation.  The same goes for CoreOS monitoring and centralized CoreOS log management, too!

[Note: We’re also holding webinars on Docker Monitoring on September 14 and 15.]

The webinar will be presented by Stefan Thies, our DevOps Evangelist, deeply involved in Sematext’s activities around monitoring and logging in Docker and CoreOS.  A post-webinar Q&A will take place — in addition to the encouraged attendee interaction during the webinar.

Dates/Times

We’re holding two identical sessions to accommodate attendees on different continents.

Register_Now_2                        Register_Now_2

September 29                                  September 30

 

“Show, Don’t Tell”

The infographic below will give you a good idea of what Stefan will be showing and discussing in the webinar.

Docker_logging_graphic

Got Questions, or Docker Logging topics you’d like Stefan to address?

Leave a comment, ping @sematext or send us an email — we’re all ears.

Whether you’re using Docker or not, we hope you join us on one of the webinars.  Docker is hot — let us help you take advantage of it!

Introducing AppMap

[Note: This post is part of a series on Transaction Tracing — links to the other posts are at the bottom on this post]

As mentioned in the Transaction Tracing for Performance Bottleneck Detection and Transaction Tracing Reports and Traces posts, when you enable Transaction Tracing in SPM you will also automatically get:

  • Request Throughput
  • Request Latency
  • Error & Exception Rates
  • AppMap

Today we’re happy to officially introduce AppMaps. What’s AppMap? As you can see below, AppMap is a map-like visual representation of your complete application architecture. AppMaps show which components are communicating with which components, at what throughput and latency, at what network speed, whether there are any errors between them, etc.  Connections to external services and databases are also captured and visualized.

As such, AppMaps help you:

  • Instantly see your whole architecture and its operational state and health
  • Bring up to speed new team members by showing them the current architecture instead of showing them outdated architecture diagrams;
  • Keep the whole team up to date about the latest architecture

AppMap1_annotated

Things to note:

  • Errors and exceptions are shown in red when they are detected
  • Components are color-coded:
    • Orange components represent external HTTP services
    • Green components are databases (e.g., SQL server has its own shade of green; other databases have their own shades)
    • Blue components are other SPM Apps (e.g., Elasticsearch has its own shade, etc., etc.)
  • Arrows between components have variable thickness – thicker arrows mean bigger throughput (rpm).
  • Greater opacity means smaller latency.

Clicking on any of the components on the AppMap shows more details about that component, such as:

  • Overall Throughput, Latency, Error and Exception rates (also shown as sparklines)
  • Incoming and Outgoing connections and Throughput and Latency between them
  • List of Hosts/Nodes when an SPM App is selected with Throughput, Latency, Error and Exception rates for each of them

AppMap2_annotated

If you’d like to see AppMap for your applications, do the following:

That’s it!

Not using SPM yet, but would like to trace your apps? Easy: register here — there is no commitment and you can leave your credit card in your wallet.  You get 30 days Free for new SPM Apps so even if you don’t end up falling in love with SPM for monitoring, alerting and anomaly detection, or Logsene for your logs, you can use the Distributed Transaction Tracing to quickly speed up your apps!  Oh, and if you are a young startup, a small or non-profit organization, or an educational institution, ask us for a discount (see special pricing)!

——-

Here are the other posts in our Transaction Tracing series:

Webinar: Docker Monitoring

There are many ways to skin a cat.

If you use Docker you know that these deployments can be very dynamic, not to mention all the ways there are to monitor Docker containers, collect logs from them, etc. etc.  And if you didn’t know these things, well, you’ve come to the right place!

Sematext has been on the forefront of Docker monitoring, along with Docker event collection, charting, and correlation.  The same goes for CoreOS monitoring and CoreOS centralized log management.  So it’s only natural that we’d like to share our experiences and how-to knowledge with the growing Docker and container community.  We’re holding a couple webinars in September to go through a number of different Docker monitoring options, point out their pros and cons, and offer solutions for Docker monitoring.

[Note: We’re also holding webinars on Docker Logging on September 29 and 30.]

The webinar will be presented by Stefan Thies, our DevOps Evangelist, deeply involved in Sematext’s activities around monitoring and logging in Docker and CoreOS.  A post-webinar Q&A will take place — in addition to the encouraged attendee interaction during the webinar.

Dates/Times

We’re holding two identical sessions to accommodate attendees on different continents.

Register_Now_2                        Register_Now_2

September 15                                  September 16

 

“Show, Don’t Tell”

The infographic below will give you a good idea of what Stefan will be showing and discussing in the webinar.

Docker_webinar_infographic

Got Questions, or topics you’d like Stefan to address?

Leave a comment, ping @sematext, or send us email – we’re all ears.

Whether you’re using Docker or not, we hope you join us on one of the webinars.  Docker is hot — let us help you take advantage of it!

Follow

Get every new post delivered to your Inbox.

Join 173 other followers