Relevance Tuning and Competitive Advantage via Search Analytics

Here are two cool things about Search Analytics that I’d like to point out.  The slides are stolen from our Search Analytics presentation at Enterprise Search Summit 2011 in Washington DC.

Search Analytics for A/B testing, relevance tuning and improvements

This slide shows how Search Analytics can be used to help with A/B testing.  Concretely, in this slide we see two Solr Dismax handlers selected on the right side.  If you are not familiar with Solr, think of a Dismax handler as an API that search applications call to execute searches.  In this example, each Dismax handler is configured differently and thus each of them ranks search hits slightly differently.  On the graph we see the MRR (see Wikipedia page for Mean Reciprocal Rank details) for both Dismax handlers and we can see that the one corresponding to the blue line is performing much better.  That is, users are clicking on search hits closer to the top of the search results page, which is one of several signals of this Dismax handler providing better relevance ranking than the other one.  Once you have a system like this in place you can add more Dismax handlers and compare 2 or more of them at a time.  As the result, with the help of Search Analytics you get actual, real feedback about any changes you make to your search engine.  Without a tool like this, you cannot really tune your search engine’s relevance well and will  be doing it blindly.


A/B Testing with Search Analytics

A/B Testing with Search Analytics

Note: while in this slide we see two Solr Dismax handlers, Sematext Search Analytics is search vendor agnostic - the same thing can be done with search powered by FAST/Microsoft Search, Attivio, Endeca/Oracle, Autonomy/HP, Vivisimo, Dieselpoint, Coveo, ElasticSearch, vanilla Lucene, Xapian, or Sphinx, or …

Gaining Competitive Advantage with Search Analytics

As you can see, the only way to fix or improve things, and in this case we are talking about various aspects of search experience, is by having something with which you can measure this search experience and your changes.  You need something to tell you when it’s time to improve things, and you need something that gives you feedback about your changes: Did the key metrics improve after you changes?  If so, how much?  Did any metrics degrade? etc.

Search Analytics Key Takeways

Search Analytics Key Takeways

I can’t emphasize enough how important Search Analytics is and how few organizations use it or use it well and consistently.  While this may be a bit mind boggling for those of us who live and breath search, from your perspective this is a great thing – it means that if you are smart about using Search Analytics to improve your search engine and your users’ search experience, you will gain competitive advantage and be ahead of your competitors who still don’t have or don’t use Search Analytics!

If you have any questions of feedback about Search Analytics in general, please leave a comment and we’ll follow up as soon as possible!

@sematext

Search Analytics at Enterprise Search Summit Fall 2011 Presentation

Here is another take on Search Analytics, this one being presented at Enterprise Search Summit Fall 2011 in Washington DC, to an audience coming mainly from the US government agencies, very large enterprises, and large international companies with 10s of thousands of employees world wide.  The audience was good and posed a number of good questions after the talk.  The full slide deck is below as well as in Sematext@Slideshare.

What’s Your Search Analytics Solution of Choice?

Here is a quick one.  What’s your Search Analytics tool of choice, if you have one?

And if you are happy or unhappy with the solution you are using, we’d love to hear what it is about your solution that is making you happy or unhappy - please leave a comment.

Thanks!

Search Analytics: Business Value & NoSQL Backend Presentation

Last week involved a few late nights for some of us at Sematext – we were busy readying our Search Analytics and Scalable Performance Monitoring services, as well as putting the final touches on the our Search Analytics: Business Value & NoSQL Backend presentation for Lucene Eurocon in Barcelona.

In the past we’ve given a few other public talks about Search Analytics and you can check them all out via http://blog.sematext.com/tag/analytics/.

Search Analytics: What? Why? How? Presentation

Back in May and June of this year, 2011, we went to two conferences: Lucene Revolution 2011 in San Francisco and Berlin Buzzwords.  We went there to talk about what Search Analytics is, why it’s important to have a good search analytics tool, what the benefits of search analytics are, etc.  We never posted that presentation deck online, so we are doing it now.  Note that we will be talking about Search Analytics at Lucene Eurocon 2011 in Barcelona, too, but that talk will be different than the one whose slides are below.

Enjoy!

Search Analytics – Video Interview with Otis Gospodnetić

“I’m shocked companies aren’t using these tools.”

This video interview about Search Analytics is from Techilicious by Josette Rigsby: http://techielicous.com/2011/06/04/search-and-analytics/

“…we had a chance to speak with Otis Gospodnetić, co-author of Lucene in Action and Founder of Sematext regarding search analytics and searching big data.”

Enterprise search is growing in importance along with data sizes; there is simply to much content to locate without the aid of a search tool; but, are users really  finding what they need? Unfortunately, many companies can not answer that question. Gospodnetić advised that organizations should be collecting at least a minimum set of metrics about  search performance and user behavior. However, the majority are not.

Unlike click stream analysis, search analytics provides insight into how users are actually using search – the actual terms they specify – instead of just what they clicked. Key metrics organizations should collect on an on-going basis include:

  • Search failure (zero results)
  • Low click-through rate
  • Most popular searches (words and phrases)

Once the metrics are collected, organizations should analyze the data to improve the search experience. For example, if a significant percentage of queries are failing organizations can use the data from search analytics to find out why. Is it due to misspellings? Are there synonyms? Gospodnetić said,

“I’m shocked companies aren’t using these tools.”

For more information on this topic read about Sematext Search Analytics service.

Sematext at Berlin Buzzwords 2011

As part of Sematext’s Summer 2011 Conference Tour we are going to be visiting the good old Europe and giving a talk at Berlin Buzzwords.  This is the second year for Berlin Buzzwords, “a conference for developers and users of open source software projects, focusing on the issues of scalable search, data-analysis in the cloud and NoSQL-databases. Berlin Buzzwords presents more than 30 talks and presentations of international speakers specific to the three tags “search”, “store” and “scale”“.  Last year, one of us from Sematext went there as an attendee.  This year, three of us are going and one of us is giving a talk – @OtisG will be speaking about Search Analytics on June 6th.  That’s the first day of the conference and we are first in line to talk at 11:00 AM, right after the morning coffee.  Doug Cutting and Ted Dunning will be giving Keynotes.  Some of us may also be there for some of the Hackathons/Workshops before and/or after the conference.  If you are going to be there and would like to meet up, please let us know@sematext.

For more information on this topic read about Sematext Search Analytics service.

Hiring Search and Data Analytics Engineers

We are growing and looking for smart people to join us either in an “elastic”, on-demand, per-project, or more permanent role:

Lucene/Solr expert who…

  • Has built non-trivial applications with Lucene or Solr or Elastic Search, knows how to tune them, and can design systems for large volume of data and queries
  • Is familiar with (some of the) internals of Lucene or Solr or Elastic Search, at least on the high level (yeah, a bit of an oxymoron)
  • Has a systems/ops bent or knows how to use performance-related UNIX and JVM tools for analyzing disk IO, CPU, GC, etc.

Data Analytics expert who…

  • Has used or built tools to process and analyze large volumes of data
  • Has experience using HDFS and MapReduce, and have ideally also worked with HBase, or Pig, or Hive, or Cassandra, or Voldemort, or Cascading or…
  • Has experience using Mahout or other similar tools
  • Has interest or background in Statistics, or Machine Learning, or Data Mining, or Text Analytics or…
  • Has interest in growing into a Lead role for the Data Analytics team

We like to dream that we can find a person who gets both Search and Data Analytics, and ideally wants or knows how to marry them.

Ideal candidates also have the ability to:

  • Write articles on interesting technical topics (that may or may not relate to Lucene/Solr) on Sematext Blog or elsewhere
  • Create and give technical talks/presentations (at conferences, local user groups, etc.)

Additional personal and professional traits we really like:

  • Proactive and analytical: takes initiative, doesn’t wait to be asked or told what to do and how to do it
  • Self-improving and motivated: acquires new knowledge and skills, reads books, follows relevant projects, keeps up with changes in the industry…
  • Self-managing and organized: knows how to parcel work into digestible tasks, organizes them into Sprints, updates and closes them, keeps team members in the loop…
  • Realistic: good estimator of time and effort (i.e. knows how to multiply by 2)
  • Active in OSS projects: participates in open source community (e.g. mailing list participation, patch contribution…) or at least keeps up with relevant projects via mailing list or some other means
  • Follows good development practices: from code style to code design to architecture
  • Productive, gets stuff done: minimal philosophizing and over-designing

Here are some of the Search things we do (i.e. that you will do if you join us):

  • Work with external clients on their Lucene/Solr projects.  This may involve anything from performance troubleshooting to development of custom components, to designing highly scalable, high performance, fault-tolerant architectures.  See our services page for common requests.
  • Provide Lucene/Solr technical support to our tech support customers
  • Work on search-related products and services

A few words about us:

We work with search and big data (Lucene, Solr, Nutch, Hadoop, MapReduce, HBase, etc.) on a daily basis.  Our projects with external clients range from 1 week to several months.  Some clients are small startups, some are large international organizations.  Some are top secret.  New customers knock on our door regularly and this keeps us busy at pretty much all times.  When we are not busy with clients we work on our products.  We run search-lucene.com and search-hadoop.com.  We participate in open-source projects and publish monthly Digest posts that cover Lucene, Solr, Nutch, Mahout, Hadoop, and HBase.  We don’t write huge spec docs, we work in sprints, we multitask, and try our best to be agile. We send people to conferences, trainings (Hadoop, HBase, Cassandra), and certifications (2 of our team members are Cloudera Certified Hadoop Developers).

We are a small and mostly office-free, highly distributed team that communicates via email, Skype voice/IM, BaseCamp.  Some of our developers are in Eastern Europe, so we are especially open to new team members being in that area, but we are also interested in good people world-wide, from South America to Far East.

Interested? Please send your resume to jobs @ sematext.com.

Search Analytics: Hadoop World Presentation

After our Lucene Revolution talk in Boston, we got ready for last week’s Hadoop World conference in New York.  Like at the Lucene Revolution, we presented to a packed room of 200+ people. The topic of our talk was the Search Analytics tool we’ve built with the help of Flume, HBase, MapReduce, and other open-source tools, and which are now starting to use for search-hadoop.com and search-lucene.com.  If you couldn’t make it to Hadoop World, have a look at our presentation below.  And if you’d like to work on Search, Analytics, and related areas, we’re looking for good people world-wide – see our jobs page.  Enjoy!

Follow

Get every new post delivered to your Inbox.

Join 632 other followers