June 9, 2014 Leave a comment
The biggest open source conference dedicated to Apache Lucene/Solr takes place in November in Washington, DC. If you are planning to attend — and even if you are not — you can help improve the conference’s content by voting for your favorite talk topics. The top vote-getters for each track will be added to Lucene/Solr Revolution 2014 agenda.
Not surprisingly for one of the leading Lucene/Solr products and services organizations, Sematext has two contenders in the Tutorial track:
We’d love your support to help us contribute our expertise to this year’s conference. To vote, simply click on the above talk links and you’ll see a “Vote” button in the upper left corner. That’s it!
To give you a better sense of what Radu and Rafal would like to present, here are their talk summaries:
Performance tuning is always nice for keeping your applications snappy and your costs down. This is especially the case for logs, social media and other stream-like data that can easily grow into terabyte territory.
While you can always use SolrCloud to scale out of performance issues, this talk is about optimizing. First, we’ll talk about Solr settings by answering the following questions:
- How often should you commit and merge?
- How can you have one collection per day/month/year/etc?
- What are the performance trade-offs for these options?
Then, we’ll turn to hardware. We know SSDs are fast, especially on cold-cache searches, but are they worth the price? We’ll give you some numbers and let you decide what’s best for your use case.
The last part is about optimizing the infrastructure pushing logs to Solr. We’ll talk about tuning Apache Flume for handling large flows of logs and about overall design options that also apply to other shippers, like Logstash. As always, there are trade-offs, and we’ll discuss the pros and cons of each option.
Working as a consultant, software engineer and helping people in various ways we can see multiple patterns on how Solr is used and how it should be used. We all usually say what should be done, but we don’t talk and point out why we should not go some ways. That’s why I would like to point out common mistakes and roads that should be avoided at all costs. During the talk I would like not only to show the bad patterns, but also show the difference before and after.
The talk is divided into three major sections:
- We will start with general configuration pitfalls that people are used to make. We will discuss different use cases showing the proper path that one should take
- Next we will focus on data modeling and what to avoid when making your data indexable. Again we will see real life use cases followed by the description how to handle them properly
- Finally we will talk about queries and all the juicy mistakes when it comes to searching for indexed data
Each shown use case will be illustrated by the before and after analysis – we will see the metrics changes, so the talk will not only bring pure facts, but hopefully know-how worth remembering.
Thank you for your support!