Video: Scaling Solr with SolrCloud

During last  year’s Lucene Revolution conference in Dublin we had the opportunity to give four talks, one of which was Scaling Solr with SolrCloud. Through it we wanted to share our experiences around scaling Solr, especially as we have experience in running Solr internally and as a team of search consultants.  Enjoy the video and/or the slides!

Note: we are looking for engineers passionate about search to join our professional services team.  We’re hiring planet-wide!

Video: Administering and Monitoring SolrCloud Clusters

As you know, at Sematext, we are not only about consulting services, but also about administration, monitoring, and data analysis. Because of that, during last year’s Lucene Revolution conference in Dublin we gave a talk about administration and monitoring of SolrCloud clusters. During the talk, Rafał Kuć discusses some administration procedures for SolrCloud like collection management and schema modifications with the schema API. In addition, he also talks about why monitoring is important and what to pay attention to. Finally, he shows three real life examples of monitoring usefulnesses.  Enjoy the video and/or the slides!

Note: we are looking for engineers passionate about search to join our professional services team.  We’re hiring planet-wide!




Presentation: Introduction to Elasticsearch

During the New York Search, Discovery and Analytics group in November, we did a presentation on Introduction to Elasticsearch. It was about how you can use Elasticsearch to store, search and analyze your data in near-realtime. It included a demo on the basic functionality (index, retrieve, update, delete, search, facet) as well as scaling, and some tips for performance tuning and monitoring your Elasticsearch cluster.

Please tweet about Presentation: Introduction to Elasticsearch

If you attended, you might want to see Gilt’s blog post about it, which includes some nice pictures. If you didn’t attend, of if you just want to refresh your memory, you can find the video (courtesy of g33ktalk) and the slides below.

Presentation: Solr for Indexing and Searching Logs

Since we’ve added Solr output for Logstash, indexing logs via Logstash has become a possibility.  But what if you are not using (only) Logstash?  Are there other ways you can index logs in Solr?  Oh yeah, there are!  The following slides are from Lucene Revolution conference that just took place in Dublin where we talked about indexing and searching logs with Solr.

If you are interesting in Log Analytics and would like to work on things like Logsenewe are looking for good people at all levels – from JavaScript Developers and Backend Engineers, to Evangelists, Sales, Marketing, etc.


Presentation: Solr for Analytics

Last week, a bunch of Sematextans were at Lucene Revolution conference in Dublin, where we were both sponsors and presenters.  There were a number of interesting talks and we saw great interest in SPM from people who want to use it to monitor Solr (and more) and want to send their logs to Logsene, which confirmed Sematext is going in the right direction and is creating products and services that are in demand and solve real-world problems.

Below are the slides from one of our four talks from the conference.  This talk was about our experience using Solr as an alternative data store used for SPM, in which we share our findings and observations about using Solr for large scale aggregations, analytical queries, applications with high write throughput, performance improvements in Solr 4.5, the lower memory footprint of DocValues, and more.

If you are interesting in this sort of stuff, we are looking for good people at all levels – from JavaScript Developers and Backend Engineers, to Evangelists, Sales, Marketing, etc.


Video Presentation: On Centralizing Logs

You might have seen our PDF presentation from Monitorama that was published last week. Now, the video is available as well. You will be able to see more about tuning Elasticsearch’s configuration for logging. You’ll also learn what the various flavors of syslog are all about – and some tips for making rsyslog process hundreds of thousands of messages per second. And, of course, one can’t talk about centralizing logs without mentioning Kibana and Logstash.

If you like using these tools, you might want to check out our Logsene, which will do the heavy lifting for you. If you like working with them, we’re hiring, too.


For the occasion, Sematext is giving a 20% discount for all SPM applications. The discount code is MONEU2013.


Presentation: On Centralizing Logs

… with Syslog, LogStash, Elasticsearch, Kibana, and friends, one might add.  If you liked Recipe: rsyslog + Elasticsearch + Kibana, you’ll like this presentation.  We’ve also published the actual 25-minute video of the presentation.

For the occasion, Sematext is giving a 20% discount for all SPM applications. The discount code is MONEU2013.

Also, Manning is giving a 44% discount for Elasticsearch in Action and all the other books from their website. The discount code is mlmoneu13cf.

For those interested in Logsene, our Logstash + Syslog + Elasticsearch + Kibana service mentioned in the talk, we’ll notify you when Logsene becomes fully (and freely!) available next month if you leave your name on the Logsene page.

Below is a sketchnote of the whole talk, which was printed and given to all attendees. Click on the image to get the full resolution.


Berlin Buzzwords 2013 – Two Talks from Sematext

Last year at Berlin Buzzwords we were proud to give three talks. Alex talked about “Real-time Analytics with HBase” (slides, video), Otis talked about large scale monitoring in his talked titled “Large Scale ElasticSearch, Solr & HBase Performance Monitoring” (slides, video) and Rafał gave a talk about how we scale ElasticSearch clusters in his “Scaling Massive ElasticSearch Clusters” talk (slides, video). We were also very happy to be one of the sponsors of this great conference :) Because we really enjoyed the conference we decided to submit a few proposals this year and they got accepted. In this years schedule we will be giving the following talks:

Radu: JSON Logging with ElasticSearch

This talk is about aggregating loooots of logs – searching of seriously big data. We’ll go through everything we can possibly go through in 20 minutes. We’ll look at how, where, when, why, and what to log. We’ll show how to use Elasticsearch as a data store for logs and what the benefits of doing so are. We’ll discuss advantages and disadvantages of logging in JSON, which is easily processed by machines, over traditional logging, which is easily processed by humans. Finally, we’ll explore how you can get your logs – JSON or not – into Elasticsearch, run searches and statistics on them, and create pretty graphs you can’t stop staring at.

Rafał: Battle of the Giants, Round 2


Learn about how both of these great enterprise search servers are evolving and adding new features. We will be comparing the latest and greatest versions of Solr and ES, both of which are using Lucene 4.x and bringing different approaches to handling codecs, per field similarities, and more. Of course, we’ll not only look at technical aspects of both Apache Solr and ElasticSearch, but will also dig into the makeup of their contributors, compare the code and of course the user community. By the end of the talk you’ll learn the main differences when it comes to these two search servers, how they handle shard and replica distribution, automatic data replication, and different query types. In addition, you’ll learn what the admin APIs for both Solr and ElasticSearch look like and how to use them to control and alter your cluster state. Last, but not least, you’ll learn what to avoid when using ElasticSearch or Apache Solr.

We hope to see some of you in Berlin.  If these topics are of interest to you, but you won’t be coming to Berlin, feel free to get in touch, leave comments, or ping @sematext. As usual we’ll be posting slides after the talks and the organizers will probably record the talk and publish it after the conference. And if you love working with things our talks are about, we are hiring world-wide!

Slides: Battle of the Giants – Solr 4.0 vs ElasticSearch 0.20.0

Slides for the Battle of the Giants talk Rafał Kuc (@kucrafal) gave at ApacheCon EU 2012 are now up!

If you like working with Solr and/or ElasticSearch, or HBase, Hadoop, Kafka, Flume, etc., use and/or develop highly scalable distributed applications and frameworks, if you like to work on Analytics and Big Data applications and services, we’re looking for good, smart, and fun people!

And if you liked the above presentation, you may also want read our ElasticSearch vs. Solr series and see Scaling Massive ElasticSearch Clusters.

Presentation: Intro to HBase Internals and Schema Design

Below are the slides (and audio) from the Intro to HBase Internals and Schema Design presentation  Alex gave at had our inaugural HBase NYC meetup.   See also: Introduction to HBase

We’re hiring people who want to work WITH and ON HBase and other Big Data technologies.  See jobs @ sematext.


Get every new post delivered to your Inbox.

Join 1,550 other followers