Video: Administering and Monitoring SolrCloud Clusters

As you know, at Sematext, we are not only about consulting services, but also about administration, monitoring, and data analysis. Because of that, during last year’s Lucene Revolution conference in Dublin we gave a talk about administration and monitoring of SolrCloud clusters. During the talk, Rafał Kuć discusses some administration procedures for SolrCloud like collection management and schema modifications with the schema API. In addition, he also talks about why monitoring is important and what to pay attention to. Finally, he shows three real life examples of monitoring usefulnesses.  Enjoy the video and/or the slides!

Note: we are looking for engineers passionate about search to join our professional services team.  We’re hiring planet-wide!




Video: Using Solr for Logs with Rsyslog, Flume, Fluentd and Logstash

A while ago we published the slides from our talk at Lucene Revolution about using Solr for indexing and searching logs. This topic is of special interest for us, since we’ve released Logsene and we’re also offering consulting services for logging infrastructure. If you’re also into working with search engines or logs, please note that we’re hiring worldwide.

The video for that talk is now available, and you can watch it below. The talk is made of three parts:

  • one that discusses the general concepts of what a log is, structured logging and indexing logs in general, whether it’s Solr or Elasticsearch
  • one that shows how to use existing tools to send logs to Solr: Rsyslog and Fluentd to send structured events (yes, structured syslog!); Apache Flume and Logstash to take unstructured data, make it structured via Morphlines and Grok, and then send it to Solr
  • one that shows how to optimize Solr’s performance for handling logs. From tuning the commit frequency and merge factor to using time-based collections with aliases

Poll: Using SolrCloud or Not?

It’s been 9 months since we conducted a poll on SolrCloud usage.  A lot of things can change in 9 months.  SolrCloud itself went through a ton of development and bug fixing since our last poll.  It’s time to see how many of us are using SolrCloud now, at the end of 2013.

Please tweet this poll and help us spread the word, so we can get a good, statistically significant results.

Presentation: Scaling Solr with SolrCloud

Squeezing the maximal possible performance out of Solr / SolrCloud, and Elasticsearch and making them scale well is what we do on a daily basis for our clients.  We make sure their servers are optimally configured and maximally utilized.  Rafal Kuć gave a long, 75-minute talk on the topic of Scaling Solr with SolrCloud at Lucene Revolution 2013 conference in Dublin. Enjoy!

If you are interesting in working with Solr and/or Elasticsearch, we are looking for good people to join our team.

Presentation: Administering and Monitoring SolrCloud Clusters

Rafal Kuć gave two talks about Solr at Lucene Revolution.  One of them, Administering and Monitoring SolrCloud Clusters is below.

If you are interesting in working with Solr and/or Elasticsearch, we are looking for good people to join our team.

Presentation: Solr for Indexing and Searching Logs

Since we’ve added Solr output for Logstash, indexing logs via Logstash has become a possibility.  But what if you are not using (only) Logstash?  Are there other ways you can index logs in Solr?  Oh yeah, there are!  The following slides are from Lucene Revolution conference that just took place in Dublin where we talked about indexing and searching logs with Solr.

If you are interesting in Log Analytics and would like to work on things like Logsenewe are looking for good people at all levels – from JavaScript Developers and Backend Engineers, to Evangelists, Sales, Marketing, etc.


Presentation: Solr for Analytics

Last week, a bunch of Sematextans were at Lucene Revolution conference in Dublin, where we were both sponsors and presenters.  There were a number of interesting talks and we saw great interest in SPM from people who want to use it to monitor Solr (and more) and want to send their logs to Logsene, which confirmed Sematext is going in the right direction and is creating products and services that are in demand and solve real-world problems.

Below are the slides from one of our four talks from the conference.  This talk was about our experience using Solr as an alternative data store used for SPM, in which we share our findings and observations about using Solr for large scale aggregations, analytical queries, applications with high write throughput, performance improvements in Solr 4.5, the lower memory footprint of DocValues, and more.

If you are interesting in this sort of stuff, we are looking for good people at all levels – from JavaScript Developers and Backend Engineers, to Evangelists, Sales, Marketing, etc.


Announcement: Logstash Support for SolrCloud

While using Elasticsearch for log indexing is all the rage these days (and this is one of the reasons Logsene exposes an Elasticsearch API), especially from Logstash which has had an Elasticsearch output for a long time now, there is no reason one could not index logs into Solr – SolrCloud, more specifically.

To help people get their logs into Solr(Cloud) we wrote up a simple Logstash output for Solr(Cloud) and made it available in LOGSTASH-1405, with the accompanying pull request 675.

Give it a try and ping @sematext – we’d love to know if anyone finds this useful!

And if this is of interest, consider coming to Dublin to hear Using Solr to Search and Analyze Logs, which is one of our four talks at this year’s Lucene Revolution conference in Dublin.

Job: Solr / Elasticsearch Engineer @ Sematext

Sematext is 100% engineers and there are about 10 of us now.  In addition to being on a lookout for our Head of Marketing, we are looking for solid search engineers who love Solr and/or Elasticsearch, who want to use  their search skills to work with our growing list of international clients, and who want to join our  awesome, super distributed engineering team.

Together, we’ve built several exciting products – from smaller, search-focused products that work with Solr and Elasticsearch, to larger ones like SPMSearch Analytics, and most recently Logsene.  While not building products and running services, we help organizations world-wide with their search and big data needs – from fixing issues and providing production support to building complex search systems from scratch.  Our client list is long with a number of household names on it – from Instagram (Facebook) and Tumblr (Yahoo), Etsy and Shutterstock, to The BBC, Elsevier, Lockheed Martin, Reuters, Library of Congress, etc.  We did this without raising any money.  To date, virtually all of our business came to us without us doing much real marketing – fama volat in action.  The demand for our products and services is growing and we are looking for good engineers and good people to join our adventure!

More formally:

Sematext is looking for a responsible, professional individual to join our team of search engineers.

Sematext is a New York-based startup with people spread over multiple continents and several hundred customers from Instagram and Tumblr, Etsy and Shutterstock, to The BBC, Elsevier, Lockheed Martin, Reuters, Library of Congress, etc. We’ve built systems handling over 10,000 QPS and have worked with multi-billion document indices. Our core products are:

  • SPM – performance monitoring
  • Search Analytics
  • Logsene – log and data analytics
  • Several search-focused products

In addition to the above products we offer consulting services around open source search and big data.

We are looking for a person who is:

  • Enthusiastic and positive
  • Driven, independent, and professional
  • A good communicator, both written and oral
  • Good with Solr and/or Elasticsearch and is hungry to learn more
  • Enjoys helping organizations make the best out of search

As a member of our search team you will get to:

  • Interact with clients world-wide
  • Provide guidance, architecture design, implementation, and support
  • Participate in Solr, Lucene, and Elasticsearch user and development communities
  • Work on Sematext’s search and data analytics products and participate in open-source search projects

This position:

  • Offers a lot of independence, learning, and growth
  • Does not require travel, but does offer the opportunity for travel for those who want that
  • Is open world-wide

Our search team members have written several books about search, regularly give talks at conferences, blog, and participate in open-source projects.
For more info, see 19 things you may like about Sematext.
Come join us build cool products!

4 Lucene Revolution Talks from Sematext

Bingo! We’re 4 of 4 at Lucene Revolution 2013 – 4 talk proposals and all 4 accepted!  We are hiring just so next year we can attempt getting 5 talks in. ;)  We’ll also be exhibiting at the conference, so stop by.  We will be giving away Solr and Elasticsearch books.  Here’s what we’ll be talking about in Dublin on November 6th and 7th:

In Using Solr to Search and Analyze Logs Radu will be talking about … well, you guessed it – using Solr to analyze logs.  After this talk you may want to run home (or back to the hotel) and hack on LogStash or Flume, and Solr and get Solr to eat your logs…. but don’t forget we have to keep Logsene well fed.  Feed this beast your logs like we feed it ours and help us avoid getting eaten by our own creation.


Many of us tend to hate or simply ignore logs, and rightfully so: they’re typically hard to find, difficult to handle, and are cryptic to the human eye. But can we make logs more valuable and more usable if we index them in Solr, so we can search and run real-time statistics on them? Indeed we can, and in this session you’ll learn how to make that happen. In the first part of the session we’ll explain why centralized logging is important, what valuable information one can extract from logs, and we’ll introduce the leading tools from the logging ecosystems everyone should be aware of – from syslog and log4j to LogStash and Flume. In the second part we’ll teach you how to use these tools in tandem with Solr. We’ll show how to use Solr in a SolrCloud setup to index large volumes of logs continuously and efficiently. Then, we’ll look at how to scale the Solr cluster as your data volume grows. Finally, we’ll see how you can parse your unstructured logs and convert them to nicely structured Solr documents suitable for analytical queries.

Rafal will teach about Scaling Solr with SolrCloud in a 75-minute session.  Prepare for taking lots of notes and for scaling your brain both horizontally and vertically while at the same time avoiding split-brain.


Configure your Solr cluster to handle hundreds of millions of documents without even noticing, handle queries in milliseconds, use Near Real Time indexing and searching with document versioning. Scale your cluster both horizontally and vertically by using shards and replicas. In this session you’ll learn how to make your indexing process blazing fast and make your queries efficient even with large amounts of data in your collections. You’ll also see how to optimize your queries to leverage caches as much as your deployment allows and how to observe your cluster with Solr administration panel, JMX, and third party tools. Finally, learn how to make changes to already deployed collections —split their shards and alter their schema by using Solr API.

Rafal doesn’t like to sleep.  He prefers to write multiple books at the same time and give multiple talks at the same conference.  His second talk is about Administering and Monitoring SolrCloud Clusters – something we and our customers do with SPM all the time.


Even though Solr can run without causing any troubles for long periods of time it is very important to monitor and understand what is happening in your cluster. In this session you will learn how to use various tools to monitor how Solr is behaving at a high level, but also on Lucene, JVM, and operating system level. You’ll see how to react to what you see and how to make changes to configuration, index structure and shards layout using Solr API. We will also discuss different performance metrics to which you ought to pay extra attention. Finally, you’ll learn what to do when things go awry – we will share a few examples of troubleshooting and then dissect what was wrong and what had to be done to make things work again.

Otis has aggregation coming out of his ears and dreams about data visualization, timeseries graphs, and other romantic visuals. In Solr for Analytics: Metrics Aggregations at Sematext we’ll share our experience running SPM on top of SolrCloud (vs. HBase, which we currently use).


While Solr and Lucene were originally written for full-text search, they are capable and increasingly used for Analytics, as Key Value Stores, NoSQL databases, and more. In this session we’ll describe our experience with Solr for Analytics. More specifically, we will describe a couple of different approaches we have taken with SolrCloud for aggregation of massive amounts of performance metrics, we’ll share our findings, and compare SolrCloud with HBase for large-scale, write-intensive aggregations. We’ll also visit several Solr new features that are in the works that will make Solr even more suitable for Analytics workloads.

See you in Dublin!


Get every new post delivered to your Inbox.

Join 1,633 other followers