Sematext Year 2011 in Review

2011 was a good year for Sematext. Here are some highlights.

Products

In 2011, we’ve released several new versions of our popular AutoComplete, Key Phrase Extractor, and DYM ReSearcher products and have witnessed a number of organizations adopting them.

SaaS

After months of hard work, we’ve opened up our Search Analytics and Performance Monitoring services to public. Anyone can sign up for an account and use either or both of these services for free.  Yes, both services are completely free now and can be used without any restrictions.

Services – Tech Support

In addition to our standard consulting services we’ve successfully started offering commercial tech support for Lucene and Solr.  These annual subscriptions come in several different packages and are made for those running Lucene or Solr in production and wanting immediate access to Lucene and Solr experts when things go awry.

Services – Consulting Packages

For those who need long-term access to Lucene or Solr experts we started offering several levels of consulting support packages.  What makes these packages attractive is that one gets immediate help to Lucene or Solr expert consultants while paying a lower rate in exchange for an annual commitment.

Conferences

We attended and presented at a number of conferences (slides and videos) in 2011 – Lucene Revolution in San Francisco in May, Berlin Buzzwords in June, Lucene Eurocon in Barcelona in October, and Enterprise Search Summit Fall in Washington, DC in November.

Open Source

During our work on Search Analytics and Performance Monitoring and specifically the parts of them that use HBase, we’ve forked a couple of open-source projects that we put up on Github.  We are looking at open-sourcing a few other things in 2012.  We’ve also contributed patches to Flume, Solr, HBase, and while in Berlin in June we took part in our first HBase Hackathon.

Team Growth

Our team has roughly doubled in size.  Our people are now in 3 different time zones and 6 countries and we are looking to expand this further.  We are actively hiring and have a number of open positions, from mobile development and design to system administration, sales and marketing.  Of course, we are always looking for search and big data experts.  The team didn’t grow only in number, but also in knowledge and expertise – we now have 2 Cloudera Certified Hadoop developers on staff, 2 published book authors, a recent Machine Learning and Artificial Intelligence Stanford online class “graduate”, etc.

Collaboration with Academia

We have partnered with a university lab on the other side of the Atlantic and have been collaborating on some interesting projects results of which we’ll share in 2012.

 

Overall, 2011 was a good year.  But we are in 2012 now and it’s time to look ahead.  Looking at our crystal ball, I see more fun work and more opportunities.  We’ll do our best to make the most of them.

Relevance Tuning and Competitive Advantage via Search Analytics

Here are two cool things about Search Analytics that I’d like to point out.  The slides are stolen from our Search Analytics presentation at Enterprise Search Summit 2011 in Washington DC.

Search Analytics for A/B testing, relevance tuning and improvements

This slide shows how Search Analytics can be used to help with A/B testing.  Concretely, in this slide we see two Solr Dismax handlers selected on the right side.  If you are not familiar with Solr, think of a Dismax handler as an API that search applications call to execute searches.  In this example, each Dismax handler is configured differently and thus each of them ranks search hits slightly differently.  On the graph we see the MRR (see Wikipedia page for Mean Reciprocal Rank details) for both Dismax handlers and we can see that the one corresponding to the blue line is performing much better.  That is, users are clicking on search hits closer to the top of the search results page, which is one of several signals of this Dismax handler providing better relevance ranking than the other one.  Once you have a system like this in place you can add more Dismax handlers and compare 2 or more of them at a time.  As the result, with the help of Search Analytics you get actual, real feedback about any changes you make to your search engine.  Without a tool like this, you cannot really tune your search engine’s relevance well and will  be doing it blindly.


A/B Testing with Search Analytics

A/B Testing with Search Analytics

Note: while in this slide we see two Solr Dismax handlers, Sematext Search Analytics is search vendor agnostic - the same thing can be done with search powered by FAST/Microsoft Search, Attivio, Endeca/Oracle, Autonomy/HP, Vivisimo, Dieselpoint, Coveo, ElasticSearch, vanilla Lucene, Xapian, or Sphinx, or …

Gaining Competitive Advantage with Search Analytics

As you can see, the only way to fix or improve things, and in this case we are talking about various aspects of search experience, is by having something with which you can measure this search experience and your changes.  You need something to tell you when it’s time to improve things, and you need something that gives you feedback about your changes: Did the key metrics improve after you changes?  If so, how much?  Did any metrics degrade? etc.

Search Analytics Key Takeways

Search Analytics Key Takeways

I can’t emphasize enough how important Search Analytics is and how few organizations use it or use it well and consistently.  While this may be a bit mind boggling for those of us who live and breath search, from your perspective this is a great thing – it means that if you are smart about using Search Analytics to improve your search engine and your users’ search experience, you will gain competitive advantage and be ahead of your competitors who still don’t have or don’t use Search Analytics!

If you have any questions of feedback about Search Analytics in general, please leave a comment and we’ll follow up as soon as possible!

@sematext

Search Analytics at Enterprise Search Summit Fall 2011 Presentation

Here is another take on Search Analytics, this one being presented at Enterprise Search Summit Fall 2011 in Washington DC, to an audience coming mainly from the US government agencies, very large enterprises, and large international companies with 10s of thousands of employees world wide.  The audience was good and posed a number of good questions after the talk.  The full slide deck is below as well as in Sematext@Slideshare.

Solr Performance Monitoring with SPM

Originally delivered as Lightning Talk at Lucene Eurocon 2011 in Barcelona, this quick presentation shows how to use Sematext’s SPM service, currently free to use for unlimited time, to monitor Solr, OS, JVM, and more.

We built SPM because we wanted to have a good and easy to use tool to help us with Solr performance tuning during engagements with our numerous Solr customers.  We hope you find our Scalable Performance Monitoring service useful!  Please let us know if you have any sort of feedback, from SPM functionality and usability to its speed.  Enjoy!

What’s Your Search Analytics Solution of Choice?

Here is a quick one.  What’s your Search Analytics tool of choice, if you have one?

And if you are happy or unhappy with the solution you are using, we’d love to hear what it is about your solution that is making you happy or unhappy - please leave a comment.

Thanks!

Search Analytics: Business Value & NoSQL Backend Presentation

Last week involved a few late nights for some of us at Sematext – we were busy readying our Search Analytics and Scalable Performance Monitoring services, as well as putting the final touches on the our Search Analytics: Business Value & NoSQL Backend presentation for Lucene Eurocon in Barcelona.

In the past we’ve given a few other public talks about Search Analytics and you can check them all out via http://blog.sematext.com/tag/analytics/.

AutoComplete with Suggestion Groups

While Otis is talking about our new Search Analytics (it’s open and free now!) and Scalable Performance Monitoring (it’s also open and free now!) services at Lucene Eurocon in Barcelona Pascal, one of the new members of our multinational team at Sematext, is working on making improvements to our search-lucene.com and search-hadoop.com sites.  One of the recent improvements is on the AutoComplete functionality we have there.  If you’ve used these sites before, you may have noticed that the AutoComplete there now groups suggestions.  In the screen capture below you can see several suggestion groups divided with pink lines.  Suggestions can be grouped by any criterion, and here we have them grouped by the source of suggestions.  The very first suggestion is from “mail # general”, which is our name for the “general” mailing list that some of the projects we index have.  The next two suggestions are from “mail # user”, followed by two suggestions from “mail # dev”, and so on.  On the left side of each suggestion you can see icons that signify the type of suggestion and help people more quickly focus on types of suggestions they are after.

Sematext AutoComplete Segments

Sematext AutoComplete Suggestion Grouping

The other interesting thing you can see in this AutoComplete implementation is a custom footer, which allows one to show any sort of messaging or even advertising to the end user.  We should also point out that one can use Sematext AutoComplete to embed custom suggestions, such as ads, and have them show up in the list of suggestions only when their (meta) data matches the input, yet displayed differently from the rest of suggestion in order to make it clear they are ads and not ordinary suggestions.

We hope you find this functionality useful.  Please let us know what you think and if you have other suggestions for how to improve either AutoComplete or search-lucene.com and search-hadoop.com, please let us know – we do listen!

Search Analytics: What? Why? How? Presentation

Back in May and June of this year, 2011, we went to two conferences: Lucene Revolution 2011 in San Francisco and Berlin Buzzwords.  We went there to talk about what Search Analytics is, why it’s important to have a good search analytics tool, what the benefits of search analytics are, etc.  We never posted that presentation deck online, so we are doing it now.  Note that we will be talking about Search Analytics at Lucene Eurocon 2011 in Barcelona, too, but that talk will be different than the one whose slides are below.

Enjoy!

Event Stream Processor Matrix

We published our first ever UI-focused post on Top JavaScript Dynamic Table Libraries the other day and got some valuable feedback – thanks!

We are back to talking about the backend again.  Our Search Analytics and Scalable Performance Monitoring services/products accept, process, and store huge amounts of data.  One thing both of these services do is process a stream of events in real-time (and batch, of course).  So what solutions are there that help one process data in real-time and perform some operations on a rolling window of data, such as the last 5 or 30 minutes of incoming event stream?  We know of several solutions that fit that bill, so we decided to put together a matrix with essential attributes of those tools in order to compare them and make our pick.  Below is the matrix we came up with.  If you are viewing this on our site, the table is likely going to be too wide, but it should look find in a proper feed reader.

Matrix part 1:

License Language Scaling Add or change rules on the fly Other infra needed Rule types
Esper GPL2, commercial java Scale up yes none Declarative, query-based
Drools Fusion ASL 2.0 java Scale up yes none Declarative, mostly rule based, but support queries too
FlumeBase ASL 2.0 java Horizontal: natural sharding on top of Flume yes Flume Declarative, query-based
Storm EPL 1.0 clojure Horizontal Can be implemented on top of Zookeeper ZeroMQ, Zookeeper Provides only low level primitives(like grouping). Rule engine should be implemented manually on top.
S4 ASL 2.0 java Horizontal Can be implemented on top of Zookeeper Zookeeper Provides set of low level primitives. Somehow correlation support via joins. Documentation have a “windowing” section, but it empty.
Activeinsight CPAL 1.0, commercial java Horizontal yes Declarative, Query-like
Kafka APL 2.0 java Horizontal Zookeeper Set of low level primitives

Matrix part 2:

Docs / examples Maturity Community URL Notes
Esper very good mature, stable medium esper.codehaus.org
Drools Fusion good 3 years, stable small jboss.org/drools/drools-fusion.html
FlumeBase good alpha small flumebase.org
Storm exists used in production growing very fast tech.backtype.com good deployment features
S4 average alpha, butused in production medium (will grow under ASF) s4.io
Activeinsight poor unknown unknown activeinsight.org
Kafka good used in production small (will grow under ASF) incubator.apache.org/kafka

So there you have it – we hope you find this useful.  If you have any comments or questions, tweet us (@sematext) or leave a comment here.  If you like working on systems that handle large volumes of data, like our Search Analytics and Scalable Performance Monitoring services, we are hiring world-wide.

Top JavaScript Dynamic Table Libraries

Since @sematext focuses on Search, Data and Text Analytics, and similar areas that typically involve exclusively backend, server-side work, we rarely publish anything that deals with UI, UX, with JavaScript, front ends, and do on. However, our Search Analytics and Scalable Performance Monitoring services/products do have rich, data-driven UIs (think reports, graphs, charts, tables), so we are increasingly thinking (obsessing?) about usability, intuitive and clean interfaces, visual data representations, etc. (in fact, we have an opening for a UI/UX designer and developer).  Recently, we decided to upgrade a group of Search Analytics reports that, until now, used a quickly-thrown-together HTML table that, as much as we loved its simplicity, needed a replacement.  So we set out to look for something better, more functional and elegant.  In the process we identified a number of JavaScript libraries for rendering and browsing tabular data.  Eventually we narrowed our selection to 6 JavaScript libraries whose characteristics we analyzed.  In the spirit of sharing and helping others, please find their feature matrix below.

Feature

DataTables

JqGrid (aka Sigma2)

Slickgrid

dhtmlxGrid

Flexigrid

ExtJs

License GPL v2 license or a BSD (3-point) license LGPL MIT Grid License $449 MIT License from $600
Show/Hide columns yes No/no No/no
Update: Yes, using ColumnPicker plugin 
Yes/yes Yes/yes Yes/yes
Resize/Reorder columns yes/yes Yes/yes Yes/yes Yes/yes Yes/yes Yes/yes
Client side sorting yes yes yes yes no yes
Support JSO as data sourc yes yes yes yes yes yes
Export data to Excel/CSV yes yes no Excel and PDF no No, but see forum
on this topic 
Endless scroll yes no yes yes no yes
Filter by columns yes yes Not very useful but exists in different area, not in header yes no yes
Search(all columns simultaneously). yes yes yes(server) no yes no
Aggregation footer no no no yes no no
Additional information Has a lot extensions and plug-ins Looks nice Handles hundreds of thousands of rows with good speed Easy java integration
www.dhtmlx.com/
blog/?tag=java

Guess which of the above libraries we chose?

If you think we missed a library that deserves a spot in our matrix or if anything looks wrong, please let us know, via comments for example.

Follow

Get every new post delivered to your Inbox.

Join 706 other followers