[Note: since this workshop has already taken place, stay up to date with future workshops at our Solr Training page]
For those of you interested in some comprehensive Solr training taught by an expert from Sematext who knows it inside and out, we’re running a super hands-on training workshop in New York City from October 19-20.
This two-day workshop will be taught by Sematext engineer — and author of Solr books — Rafal Kuc.
Developers and Devops who want to configure, tune and manage Solr at scale.
What you’ll get out of it:
In two days of training Rafal will help:
- bring Solr novices to the level where he/she would be comfortable with taking Solr to production
- give experienced Solr users proven and practical advice based on years of experience designing, tuning, and operating numerous Solr clusters to help with their most advanced and pressing issues
* See the Course Outline at the bottom of this post for details
When & Where:
- Dates: October 19 & 20 (Monday & Tuesday)
- Time: 9:00 a.m. — 5:00 p.m.
- Location: New Horizons Computer Learning Center in Midtown Manhattan (map)
- Cost: $1,200 “early bird rate” (valid through September 1) and $1,500 afterward. And…we’re also offering a 50% discount for the purchase of a 2nd seat!
- Food/Drinks: Light breakfast and lunch will be provided
Attendees will go through several sequences of short lectures followed by interactive, group, hands-on exercises. There will be a Q&A session after each such lecture-practicum block.
Got any questions or suggestions for the course? Just drop us a line or hit us @sematext!
Lastly, if you can’t make it…watch this space or follow @sematext — we’ll be adding more Solr training workshops in the US, Europe and possibly other locations in the coming months. We are also known worldwide for our Solr Consulting Services and Solr Production Support.
Hope to see you in the Big Apple in October!
Solr Training Workshop – Course Outline
- What is Solr and use – cases
- Solr master – slave architecture
- SolrCloud architecture
- Why & When SolrCloud
- Solr master – slave vs SolrCloud
- Starting Solr with schema-less configuration
- Indexing documents
- Retrieving documents using URI request
- Deleting documents
Continue reading “Solr Training in New York City — October 19-20”
If topics like log analytics and Solr are your thing then we may have a treat for you at the upcoming Lucene / Solr Revolution conference in Austin in October. Two of Sematext’s engineers and Solr, Elasticsearch and ELK stack experts — Rafal Kuc and Radu Gheorghe — have proposed a talk called “Large Scale Log Analytics with Solr” and could use some upvoting from the community to get in on this year’s agenda.
To show your support for “Large Scale Log Analytics with Solr” just click here to vote. Takes less than a minute! Even if you don’t attend the conference, we’ll post the slides and video here on the blog…assuming it gets on the agenda. Voting will close at 11:59pm EDT on Thursday, June 25th.
This talk is about searching and analyzing time-based data at scale. Documents ranging from blog posts and social media to application logs and metrics generated by smart watches and other “smart” things share a similar pattern: timestamp among their fields, rarely changeable, deletion when they become obsolete.
Very often this kind of data is so large that it causes scaling and performance challenges. We’ll address precisely these challenges, which include:
- Properly designing collections architecture
- Indexing data fast and without documents waiting in queues for processing
- Being able to run queries that include time-based sorting and faceting on enormous amounts of indexed data without killing Solr
- …and many more
We’ll start with the indexing pipeline — where you do all your ETL. We’ll show you how to maximize throughput through various ETL tools, such Flume, Kafka, Logstash and rsyslog, and make them scale and send data to Solr.
On the Solr side, we’ll show all sorts of tricks to optimize indexing and searching: from tuning merge policies to slicing collections based on timestamp. While scaling out, we’ll show how to improve the performance/cost ratio.
Thanks for your support!
[Note: this post has been updated to include video and slides from the June 2 presentation]
Back by popular demand! Sematext engineers Radu Gheorghe and Rafal Kuc returned to Berlin Buzzwords on Tuesday, June 2, with the second installment of their “Side by Side with Elasticsearch and Solr” talk. (You can check out Part 1 here.)
Elasticsearch and Solr Performance and Scalability
This brand new talk — which included a live demo, a video demo and slides — dove deeper into into how Elasticsearch and Solr scale and perform. And, of course, they took into account all the goodies that came with these search platforms since last year.
Radu and Rafal showed attendees how to tune Elasticsearch and Solr for two common use-cases: logging and product search. Then they showed what numbers they got after tuning. There was also some sharing of best practices for scaling out massive Elasticsearch and Solr clusters; for example, how to divide data into shards and indices/collections that account for growth, when to use routing, and how to make sure that coordinated nodes don’t become unresponsive.
Here is the video:
…and here are the slides:
Feedback & Questions — Bring It On
If you’ve got feedback or questions about topics like Elasticsearch vs. Solr (here’s a detailed comparison) and what’s new and exciting with both applications, just drop us a line. We live and breathe this stuff, so we’re always happy to hear from like-minded people.
Hot off the press: a brand new Solr Cookbook! One of Sematext’s Solr and Elasticsearch experts — and authors — Rafał Kuć, has just published the third and latest edition of Solr Cookbook. This edition covers both Solr 4.x (based on the newest 4.10.3 version of Solr) and the just-released Solr 5.0.
Similar to previous Solr Cookbooks, Rafal updated the book significantly — half of the previous content has been changed — and rewrote all of the recipes.
Here’s a list of the chapters:
- Apache Solr Configuration
- Indexing Your Data
- Analyzing Your Text Data
- Querying Solr
- Improving Solr Performance
- In the Cloud
- Using Additional Solr Functionalities
- Dealing with Problems
- Real-life Situations
For more information about Solr Cookbook, Third Edition — including info on getting a free chapter — check out the Packt Publishing web page dedicated to it. The book is available in both electronic and paperback versions. Even better, here is a discount code you can use for 20% off (valid until March 22, 2015; see details for applying code below*): scte20
Need Some Solr Expertise?
Rafal isn’t the only Solr expert at Sematext; we’ve got several more who have helped 100+ clients to architect, scale, tune, and successfully deploy their Solr-based products. We also offer 24/7 production support for Solr and Elasticsearch. Here’s more info about our professional services, which also include Elasticsearch and Logging consulting. You can also monitor Solr performance (and many other platforms) with SPM Performance Monitoring.
Have some feedback or questions for Rafal?
He’d love to hear from you — get him @kucrafal
* Using discount code:
- Set up a free Packt account or log into your existing account
- Add the title “Solr Cookbook – Third Edition” in the cart
- Click on ‘View Cart’
- Then in the “Do you have a promo code?” field enter scte20
- Click on the “Apply” button for the discount to get applied
by Otis Gospodnetić
[Otis is a Lucene, Solr, and Elasticsearch expert and co-author of “Lucene in Action” (1st and 2nd editions). He is also the founder and CEO of Sematext. See full bio below.]
“Solr or Elasticsearch?”…well, at least that is the common question I hear from Sematext’s consulting services clients and prospects. Which one is better, Solr or Elasticsearch? Which one is faster? Which one scales better? Which one can do X, and Y, and Z? Which one is easier to manage? Which one should we use? Which one do you recommend? etc., etc.
These are all great questions, though not always with clear and definite, universally applicable answers. So which one do we recommend you use? How do you choose in the end? Well, let me share how I see Solr and Elasticsearch past, present, and future, let’s do a bit of comparing and contrasting, and hopefully help you make the right choice for your particular needs.
Early Days: Youth Vs. Experience
Apache Solr is a mature project with a large and active development and user community behind it, as well as the Apache brand. First released to open-source in 2006, Solr has long dominated the search engine space and was the go-to engine for anyone needing search functionality. Its maturity translates to rich functionality beyond vanilla text indexing and searching; such as faceting, grouping (aka field collapsing), powerful filtering, pluggable document processing, pluggable search chain components, language detection, etc.
Continue reading “Solr vs. Elasticsearch — How to Decide?”
With the release of Solr 5.0, the most recent major version of this great search server, we didn’t only get improvements and changes from the Lucene library. Of course, we did get features like:
- segments control sum
- segments identifiers
- Lucene using only classes from Java NIO.2 package to access files
- lowered heap usage because of new Lucene50Codec
…but those features came from the Lucene core itself. Solr introduced:
- improved usability for start-up scripts
- scripts for Linux service installation and running
- distributed IDF calculation
- ability to register new handlers using the API (with jar uploads)
- replication throttling
- …and so on
All of these features come with the first release of branch 5 of Solr, and we can expect even more from future releases — like cross data center replication! We want to start sharing what we know about those features and, today, we start with replication throttling.
Continue reading “Solr 5: Replication Throttling”
The Solr Redis Plugin is an extension for Solr that provides a query parser that uses data stored in Redis. It is open-sourced on Github by Sematext. This tool is basically a QParserPlugin that establishes a connection to Redis and takes data stored in SET, ZRANGE and other Redis data structures in order to build a query. Data fetched from Redis is used in RedisQParser and is responsible for building a query. Moreover, this plugin provides a highlighter extension which can be used to highlight parts of aliased Solr Redis queries (this will be described in a future).
Use Case: Social Network
Imagine you have a social network and you want to implement a search solution that can search things like: events, interests, photos, and all your friends’ events, interests, and photos. A naive, Solr-only-based implementation would search over all documents and filter by a “friends” field. This requires denormalization and indexing the full list of friends into each document that belongs to a user. Building a query like this is just searching over documents and adding something like a “friends:1234” clause to the query. It seems simple to implement, but the reality is that this is a terrible solution when you need to update a list of friends because it requires a modification of each document. So when the number of documents (e.g., photos, events, interests, friends and their items) connected with a user grows, the number of potential updates rises dramatically and each modification of connections between users becomes a nightmare. Imagine a person with 10 photos and 100 friends (all of which have their photos, events, interests, etc.). When this person gets the 101th friend, the naive system with flattened data would have to update a lot of documents/rows. As we all know, in a social network connections between people are constantly being created and removed, so such a naive Solr-only system could not really scale.
Social networks also have one very important attribute: the number of connections of a single user is typically not expressed in millions. That number is typically relatively small — tens, hundreds, sometimes thousands. This begs the question: why not carry information about user connections in each query sent to a search engine? That way, instead of sending queries with clause “friends:1234,” we can simply send queries with multiple user IDs connected by an “OR” operator. When a query has all the information needed to search entities that belong to a user’s friends, there is no need to store a list of friends in each user’s document. Storing user connections in each query leads to sending of rather large queries to a search engine; each of them containing multiple terms containing user ID (e.g., id:5 OR id:10 OR id:100 OR …) connected by a disjunction operator. When the number of terms grows the query requests become very big. And that’s a problem, because preparing it and sending it to a search engine over the network becomes awkward and slow.
How Does It Work?
The image below presents how Solr Redis Plugin works.
Continue reading “Solr Redis Plugin Use Cases and Performance Tests”