Community Voting for Sematext Talks at Lucene/Solr Revolution 2014

The biggest open source conference dedicated to Apache Lucene/Solr takes place in November in Washington, DC.  If you are planning to attend — and even if you are not — you can help improve the conference’s content by voting for your favorite talk topics.  The top vote-getters for each track will be added to Lucene/Solr Revolution 2014 agenda.

Not surprisingly for one of the leading Lucene/Solr products and services organizations, Sematext has two contenders in the Tutorial track:

We’d love your support to help us contribute our expertise to this year’s conference.  To vote, simply click on the above talk links and you’ll see a “Vote” button in the upper left corner.  That’s it!

To give you a better sense of what Radu and Rafal would like to present, here are their talk summaries:

Tuning Solr for Logs – by Radu Gheorghe

Performance tuning is always nice for keeping your applications snappy and your costs down. This is especially the case for logs, social media and other stream-like data that can easily grow into terabyte territory.

While you can always use SolrCloud to scale out of performance issues, this talk is about optimizing. First, we’ll talk about Solr settings by answering the following questions:

  • How often should you commit and merge?
  • How can you have one collection per day/month/year/etc?
  • What are the performance trade-offs for these options?

Then, we’ll turn to hardware. We know SSDs are fast, especially on cold-cache searches, but are they worth the price? We’ll give you some numbers and let you decide what’s best for your use case.

The last part is about optimizing the infrastructure pushing logs to Solr. We’ll talk about tuning Apache Flume for handling large flows of logs and about overall design options that also apply to other shippers, like Logstash. As always, there are trade-offs, and we’ll discuss the pros and cons of each option.

Solr Anti-Patternsby Rafal Kuc

Working as a consultant, software engineer and helping people in various ways we can see multiple patterns on how Solr is used and how it should be used. We all usually say what should be done, but we don’t talk and point out why we should not go some ways. That’s why I would like to point out common mistakes and roads that should be avoided at all costs.   During the talk I would like not only to show the bad patterns, but also show the difference before and after.

The talk is divided into three major sections:

  1. We will start with general configuration pitfalls that people are used to make. We will discuss different use cases showing the proper path that one should take
  2. Next we will focus on data modeling and what to avoid when making your data indexable. Again we will see real life use cases followed by the description how to handle them properly
  3. Finally we will talk about queries and all the juicy mistakes when it comes to searching for indexed data

Each shown use case will be illustrated by the before and after analysis – we will see the metrics changes, so the talk will not only bring pure facts, but hopefully know-how worth remembering.

Thank you for your support!

Presentation and Video: Side by Side with Solr and Elasticsearch

Fresh from Berlin Buzzwords where Sematext‘s own Radu Gheorghe and Rafal Kuc presented “Side by Side with Solr and Elasticsearch” on the same stage, at the same time…but in different colors.  The talk included live demos, graphing, stats, and hints at juicy things to come.  Needless to say — if you deal with Solr and Elasticsearch then there are great insights to be found here!

Here is the presentation:

 

And here is the video:

 

Want to Be on Stage Somewhere Like Radu and Rafal Talking About Solr and Elasticsearch?

Or maybe you don’t want the spotlight — that’s cool too.  But…if you do enjoy performance monitoring, log analytics, or search analytics, working with projects like Elasticsearch, Solr, HBase, Hadoop, Kafka, and Storm, then drop us a line.  We’re hiring planet-wide!  Front end and JavaScript Developers, Developer Evangelists, Full-stack Engineers, Mobile App Developers…get in touch!

Enjoy!

Berlin Buzzwords 2014 – Side by Side with Elasticsearch and Solr

Last year at Berlin Buzzwords two Sematext Engineers had the opportunity to give two talks. Radu talked about “JSON Logging with Elasticsearch” (video, slides) and Rafał did the second round of Solr vs Elasticsearch in his talk “Battle of the Giants, round 2” (video, slides). We were also happy to be sponsoring Berlin Buzzwords 2013. This year, we decided to go for a talk where two of us can talk on the same stage, at the same time. On Tuesday, 27th of May, at 11:30, in the Frannz Club Radu and Rafał will be giving a talk called “Side by side with Solr and Elasticsearch“.

side by side

Solr – established, mature and well known open-source search server, commonly used. Elasticsearch – still young, but quickly gaining popularity, with over 200k downloads per month. Both search servers are based on Lucene – the open-source full text searching Java library, but each with their own extensions, their pros and cons.

We all know that Solr and Elasticsearch are different, but what those differences are and which solution is the best fit for a particular use case is a frequent question. We will try to make those differences clear, not by showing slides and comparing them, but by showing on online demo of both Elasticsearch and Solr:

  • Set up and start both search servers. See what you need to prepare and launch Solr and Elasticsearch.
  • Index data right after the server was started using the “schemaless” mode
  • Create index structure and modify it using the provided API
  • Explore different query use cases
  • Scale by adding and removing nodes from the cluster, creating indices and managing shards. See how that affects data indexing and querying.
  • Monitor and administer clusters.  See what metrics can be seen out of the box, how to get them and what tools can provide you with the graphical view of all the goodies that each search server can provide.

If you want to come, hear about both Solr and Elasticsearch from @sematext and how to achieve similar things, what how they behave and don’t see too many slides, come join us :)

Video: Scaling Solr with SolrCloud

During last  year’s Lucene Revolution conference in Dublin we had the opportunity to give four talks, one of which was Scaling Solr with SolrCloud. Through it we wanted to share our experiences around scaling Solr, especially as we have experience in running Solr internally and as a team of search consultants.  Enjoy the video and/or the slides!

Note: we are looking for engineers passionate about search to join our professional services team.  We’re hiring planet-wide!

Video: Administering and Monitoring SolrCloud Clusters

As you know, at Sematext, we are not only about consulting services, but also about administration, monitoring, and data analysis. Because of that, during last year’s Lucene Revolution conference in Dublin we gave a talk about administration and monitoring of SolrCloud clusters. During the talk, Rafał Kuć discusses some administration procedures for SolrCloud like collection management and schema modifications with the schema API. In addition, he also talks about why monitoring is important and what to pay attention to. Finally, he shows three real life examples of monitoring usefulnesses.  Enjoy the video and/or the slides!

Note: we are looking for engineers passionate about search to join our professional services team.  We’re hiring planet-wide!

 

 

 

Presentation: Introduction to Elasticsearch

During the New York Search, Discovery and Analytics group in November, we did a presentation on Introduction to Elasticsearch. It was about how you can use Elasticsearch to store, search and analyze your data in near-realtime. It included a demo on the basic functionality (index, retrieve, update, delete, search, facet) as well as scaling, and some tips for performance tuning and monitoring your Elasticsearch cluster.

Please tweet about Presentation: Introduction to Elasticsearch

If you attended, you might want to see Gilt’s blog post about it, which includes some nice pictures. If you didn’t attend, of if you just want to refresh your memory, you can find the video (courtesy of g33ktalk) and the slides below.

Presentation: Solr for Indexing and Searching Logs

Since we’ve added Solr output for Logstash, indexing logs via Logstash has become a possibility.  But what if you are not using (only) Logstash?  Are there other ways you can index logs in Solr?  Oh yeah, there are!  The following slides are from Lucene Revolution conference that just took place in Dublin where we talked about indexing and searching logs with Solr.

If you are interesting in Log Analytics and would like to work on things like Logsenewe are looking for good people at all levels – from JavaScript Developers and Backend Engineers, to Evangelists, Sales, Marketing, etc.

 

Presentation: Solr for Analytics

Last week, a bunch of Sematextans were at Lucene Revolution conference in Dublin, where we were both sponsors and presenters.  There were a number of interesting talks and we saw great interest in SPM from people who want to use it to monitor Solr (and more) and want to send their logs to Logsene, which confirmed Sematext is going in the right direction and is creating products and services that are in demand and solve real-world problems.

Below are the slides from one of our four talks from the conference.  This talk was about our experience using Solr as an alternative data store used for SPM, in which we share our findings and observations about using Solr for large scale aggregations, analytical queries, applications with high write throughput, performance improvements in Solr 4.5, the lower memory footprint of DocValues, and more.

If you are interesting in this sort of stuff, we are looking for good people at all levels – from JavaScript Developers and Backend Engineers, to Evangelists, Sales, Marketing, etc.

 

Video Presentation: On Centralizing Logs

You might have seen our PDF presentation from Monitorama that was published last week. Now, the video is available as well. You will be able to see more about tuning Elasticsearch’s configuration for logging. You’ll also learn what the various flavors of syslog are all about – and some tips for making rsyslog process hundreds of thousands of messages per second. And, of course, one can’t talk about centralizing logs without mentioning Kibana and Logstash.

If you like using these tools, you might want to check out our Logsene, which will do the heavy lifting for you. If you like working with them, we’re hiring, too.

 

For the occasion, Sematext is giving a 20% discount for all SPM applications. The discount code is MONEU2013.

 

Presentation: On Centralizing Logs

… with Syslog, LogStash, Elasticsearch, Kibana, and friends, one might add.  If you liked Recipe: rsyslog + Elasticsearch + Kibana, you’ll like this presentation.  We’ve also published the actual 25-minute video of the presentation.

For the occasion, Sematext is giving a 20% discount for all SPM applications. The discount code is MONEU2013.

Also, Manning is giving a 44% discount for Elasticsearch in Action and all the other books from their website. The discount code is mlmoneu13cf.

For those interested in Logsene, our Logstash + Syslog + Elasticsearch + Kibana service mentioned in the talk, we’ll notify you when Logsene becomes fully (and freely!) available next month if you leave your name on the Logsene page.

Below is a sketchnote of the whole talk, which was printed and given to all attendees. Click on the image to get the full resolution.

sketchnote

Follow

Get every new post delivered to your Inbox.

Join 1,633 other followers