Solr Presentations from Lucene/Solr Revolution 2014

Thanks to everyone who stopped by the Sematext booth at last week’s Lucene/Solr Revolution event in Washington, DC and attended our two talks:

The attendance, questions and interest are very much appreciated.  As a company that prides itself on its Solr expertise (and Elasticsearch expertise too, for that matter), it was nice to spend a couple days talking about search and Big Data challenges, performance monitoring and logging with fellow experts from around the world. Here are the slides for the two talks we gave (summaries of the talks can be found here):

 

  Videos of the talks will be posted here soon.  Hope to see everyone again next year!

Sematext at Lucene/Solr Revolution 2014

Going to Lucene/Solr Revolution next week — November 11-14 — in Washington, DC?  If so…Sematext will be there exhibiting AND giving two talks!  If you are going, stop by our table to say hello.  We can show you the latest versions of SPM Performance Monitoring, Logsene Log Management and Analytics, Site Search Analytics, and, of course, talk about metrics, centralized log management, Lucene, Solr, Elasticsearch, and just about any other search-related topic you might be interested in.  After all, not only have we blogged, given talks and spread the word in all sorts of ways, we’ve also written books on these subjects!

Both of the Sematext engineer talks take place on Friday, November 14.  They are:

Radu Gheorghe will talk about “Tuning Solr for Logs” at 10:15 am

Summary:  Performance tuning is always nice for keeping your applications snappy and your costs down. This is especially the case for logs, social media and other stream-like data that can easily grow into terabyte territory. While you can always use SolrCloud to scale out of performance issues, this talk is about optimizing. The following questions about Solr settings will be answered. How often should you commit and merge? How can you have one collection per day/month/year/etc? What are the performance trade-offs for these options?  There will also be a discussion around choosing the appropriate hardware.  Radu will talk about optimizing the infrastructure when pushing logs to Solr. This includes tuning Apache Flume to handle large flows of logs and overall design options that also apply to other shippers, like Logstash.

Rafal Kuc will talk about “Solr Anti-Patterns” at 10:55 am

Summary:  Working as a consultant, software engineer and helping people in various ways, Rafał has seen multiple patterns in how Solr is used and how it should be used. Consulting on best practices is common, but talking about what NOT to do is not. This talk will point out common mistakes and roads that should be avoided at all costs, covering use cases and guidelines around general configuration pitfalls, data modeling and what to avoid when making your data indexable, and mistakes made when it comes to queries and searching for indexed data. Each use case will be illustrated by a before and after analysis where changes in metrics will be shown to bring a know-how worth remembering.

20% Discount Code

If you currently use a Sematext product or have been a client in the past and want to go, drop us a line for more info.

Hope to see you in DC!

Two Lucene/Solr Revolution 2014 Talks Accepted!

We recently got word from Lucene/Solr Revolution 2014 (in Washington, DC from Nov. 11-14) that talks submitted by two Sematext engineers were accepted as part of the Tutorial track!  They are:

In “Tuning Solr for Logs” Radu will discuss Solr settings, hardware options and optimizing the infrastructure pushing logs to Solr.

In “Solr Anti-Patterns” Rafal will point out common Solr mistakes and roads that should be avoided at all costs.  Each of the talk’s use cases will be illustrated with a before and after analysis — including changes in metrics.

You can see more details about both talks in this recent blog post.

The full agenda, including dates and times for the talks, will be available soon on the Lucene/Solr Revolution 2014 web site.

If you do attend one of these talks please stop by and say hello to Radu and Rafal.  Not only do they know Solr inside and out, but they are good guys as well!

Love Solr Enough to Even Want to Attend One of These Talks?

If you enjoy Solr enough to even think of attending these talks — and you’re looking for a new opportunity — then Sematext might be the place for you.  We’re hiring planet-wide and currently looking for Solr and Elasticsearch Engineers, Front end and JavaScript Developers, Developer Evangelists, Full-stack Engineers, and Mobile App Developers.

JOB: Elasticsearch / Solr Engineer

We’ve grown nicely this year.  Our team has a new UI Developer, a new Solr/Elasticsearch Engineer, a new Marketing person, a new Automation Engineer, and this summer we have the first ever Intern.

Like all healthy organizations, we keep growing, and we are now looking for good Search Engineers who know Elasticsearch and/or Solr to join our geographically distributed search consulting team.  You will work remotely, from wherever you are, with smart people spread out across the planet and with an amazing array of companies world-wide on projects that range from just a week or two to several months.

At Sematext, we’ve built several exciting products – from smaller, search-focused products that work with Solr and Elasticsearch, to larger ones like SPMSearch Analytics, and most recently Logsene.  While not building products and running services, we help organizations world-wide with their search and big data needs – from fixing issues and providing production support to building complex search systems from scratch.  Our client list is long with a number of household names on it – from Instagram (Facebook) and Tumblr (Yahoo), Etsy and Shutterstock, to The BBC, Elsevier, Lockheed Martin, Reuters, Library of Congress, etc.  We did this without raising any money.  The demand for our products and services is growing and we are looking for good engineers and good people to join our adventure!

More formally:

Sematext is looking for a responsible, professional individual to join our team of search engineers.

Sematext is a New York-based startup with people spread over multiple continents and several hundred customers from Instagram and Tumblr, Etsy and Shutterstock, to The BBC, Elsevier, Lockheed Martin, Reuters, Library of Congress, etc. We’ve built systems handling over 10,000 QPS and have worked with multi-billion document indices. Our core products are:

In addition to the above products we offer consulting services around open source search and big data.

We are looking for a person who is:

  • Enthusiastic and positive
  • Driven, independent, and professional
  • A good communicator, both written and oral
  • Good with Solr and/or Elasticsearch and is hungry to learn more
  • Enjoys helping organizations make the best out of search

As a member of our search team you will get to:

  • Interact with clients world-wide
  • Provide guidance, architecture design, implementation, and support
  • Participate in Solr, Lucene, and Elasticsearch user and development communities
  • Work on Sematext’s search and data analytics products and participate in open-source search projects

This position:

  • Offers a lot of independence, learning, and growth
  • May require a bit of travel here and there, typically in the US and Europe
  • Is open world-wide

Our search team members have written several books about search, regularly give talks at conferences, blog, and participate in open-source projects.
For more info, see 19 things you may like about Sematext.

Interested? Please send your resume to jobs@sematext.com.

For other job openings please see Jobs @ Sematext or even our previous job listings.

Community Voting for Sematext Talks at Lucene/Solr Revolution 2014

The biggest open source conference dedicated to Apache Lucene/Solr takes place in November in Washington, DC.  If you are planning to attend — and even if you are not — you can help improve the conference’s content by voting for your favorite talk topics.  The top vote-getters for each track will be added to Lucene/Solr Revolution 2014 agenda.

Not surprisingly for one of the leading Lucene/Solr products and services organizations, Sematext has two contenders in the Tutorial track:

We’d love your support to help us contribute our expertise to this year’s conference.  To vote, simply click on the above talk links and you’ll see a “Vote” button in the upper left corner.  That’s it!

To give you a better sense of what Radu and Rafal would like to present, here are their talk summaries:

Tuning Solr for Logs – by Radu Gheorghe

Performance tuning is always nice for keeping your applications snappy and your costs down. This is especially the case for logs, social media and other stream-like data that can easily grow into terabyte territory.

While you can always use SolrCloud to scale out of performance issues, this talk is about optimizing. First, we’ll talk about Solr settings by answering the following questions:

  • How often should you commit and merge?
  • How can you have one collection per day/month/year/etc?
  • What are the performance trade-offs for these options?

Then, we’ll turn to hardware. We know SSDs are fast, especially on cold-cache searches, but are they worth the price? We’ll give you some numbers and let you decide what’s best for your use case.

The last part is about optimizing the infrastructure pushing logs to Solr. We’ll talk about tuning Apache Flume for handling large flows of logs and about overall design options that also apply to other shippers, like Logstash. As always, there are trade-offs, and we’ll discuss the pros and cons of each option.

Solr Anti-Patternsby Rafal Kuc

Working as a consultant, software engineer and helping people in various ways we can see multiple patterns on how Solr is used and how it should be used. We all usually say what should be done, but we don’t talk and point out why we should not go some ways. That’s why I would like to point out common mistakes and roads that should be avoided at all costs.   During the talk I would like not only to show the bad patterns, but also show the difference before and after.

The talk is divided into three major sections:

  1. We will start with general configuration pitfalls that people are used to make. We will discuss different use cases showing the proper path that one should take
  2. Next we will focus on data modeling and what to avoid when making your data indexable. Again we will see real life use cases followed by the description how to handle them properly
  3. Finally we will talk about queries and all the juicy mistakes when it comes to searching for indexed data

Each shown use case will be illustrated by the before and after analysis – we will see the metrics changes, so the talk will not only bring pure facts, but hopefully know-how worth remembering.

Thank you for your support!

Presentation and Video: Side by Side with Solr and Elasticsearch

Fresh from Berlin Buzzwords where Sematext‘s own Radu Gheorghe and Rafal Kuc presented “Side by Side with Solr and Elasticsearch” on the same stage, at the same time…but in different colors.  The talk included live demos, graphing, stats, and hints at juicy things to come.  Needless to say — if you deal with Solr and Elasticsearch then there are great insights to be found here!

Here is the presentation:

 

And here is the video:

 

Want to Be on Stage Somewhere Like Radu and Rafal Talking About Solr and Elasticsearch?

Or maybe you don’t want the spotlight — that’s cool too.  But…if you do enjoy performance monitoring, log analytics, or search analytics, working with projects like Elasticsearch, Solr, HBase, Hadoop, Kafka, and Storm, then drop us a line.  We’re hiring planet-wide!  Front end and JavaScript Developers, Developer Evangelists, Full-stack Engineers, Mobile App Developers…get in touch!

Enjoy!

Podcast: Tools to Monitor Solr, Manage Logs & Analyze Search Trends

Sematext Founder & President Otis Gospodnetic recently spoke with LucidWorks Chief of Product, Will Hayes as part of their SolrCluster podcast series.  Otis and Will discussed tools that Sematext has built to help monitor Solr and other stacks, manage and analyze logs, and analyze search trends.  They also discuss Solr/SolrCloud and Elasticsearch, their APIs, developer friendliness, as well as the general direction that search and big data industry leaders are moving toward around data acquisition and discovery as data increasingly grows.

Go here to listen to the podcast.  It runs about 36 minutes.  Enjoy!

Follow

Get every new post delivered to your Inbox.

Join 143 other followers