Sematext Year 2011 in Review

2011 was a good year for Sematext. Here are some highlights.


In 2011, we’ve released several new versions of our popular AutoComplete, Key Phrase Extractor, and DYM ReSearcher products and have witnessed a number of organizations adopting them.


After months of hard work, we’ve opened up our Search Analytics and Performance Monitoring services to public. Anyone can sign up for an account and use either or both of these services for free.  Yes, both services are completely free now and can be used without any restrictions.

Services – Tech Support

In addition to our standard consulting services we’ve successfully started offering commercial tech support for Lucene and Solr.  These annual subscriptions come in several different packages and are made for those running Lucene or Solr in production and wanting immediate access to Lucene and Solr experts when things go awry.

Services – Consulting Packages

For those who need long-term access to Lucene or Solr experts we started offering several levels of consulting support packages.  What makes these packages attractive is that one gets immediate help to Lucene or Solr expert consultants while paying a lower rate in exchange for an annual commitment.


We attended and presented at a number of conferences (slides and videos) in 2011 – Lucene Revolution in San Francisco in May, Berlin Buzzwords in June, Lucene Eurocon in Barcelona in October, and Enterprise Search Summit Fall in Washington, DC in November.

Open Source

During our work on Search Analytics and Performance Monitoring and specifically the parts of them that use HBase, we’ve forked a couple of open-source projects that we put up on Github.  We are looking at open-sourcing a few other things in 2012.  We’ve also contributed patches to Flume, Solr, HBase, and while in Berlin in June we took part in our first HBase Hackathon.

Team Growth

Our team has roughly doubled in size.  Our people are now in 3 different time zones and 6 countries and we are looking to expand this further.  We are actively hiring and have a number of open positions, from mobile development and design to system administration, sales and marketing.  Of course, we are always looking for search and big data experts.  The team didn’t grow only in number, but also in knowledge and expertise – we now have 2 Cloudera Certified Hadoop developers on staff, 2 published book authors, a recent Machine Learning and Artificial Intelligence Stanford online class “graduate”, etc.

Collaboration with Academia

We have partnered with a university lab on the other side of the Atlantic and have been collaborating on some interesting projects results of which we’ll share in 2012.


Overall, 2011 was a good year.  But we are in 2012 now and it’s time to look ahead.  Looking at our crystal ball, I see more fun work and more opportunities.  We’ll do our best to make the most of them.

Hiring Search and Data Analytics Engineers

We are growing and looking for smart people to join us either in an “elastic”, on-demand, per-project, or more permanent role:

Lucene/Solr expert who…

  • Has built non-trivial applications with Lucene or Solr or Elastic Search, knows how to tune them, and can design systems for large volume of data and queries
  • Is familiar with (some of the) internals of Lucene or Solr or Elastic Search, at least on the high level (yeah, a bit of an oxymoron)
  • Has a systems/ops bent or knows how to use performance-related UNIX and JVM tools for analyzing disk IO, CPU, GC, etc.

Data Analytics expert who…

  • Has used or built tools to process and analyze large volumes of data
  • Has experience using HDFS and MapReduce, and have ideally also worked with HBase, or Pig, or Hive, or Cassandra, or Voldemort, or Cascading or…
  • Has experience using Mahout or other similar tools
  • Has interest or background in Statistics, or Machine Learning, or Data Mining, or Text Analytics or…
  • Has interest in growing into a Lead role for the Data Analytics team

We like to dream that we can find a person who gets both Search and Data Analytics, and ideally wants or knows how to marry them.

Ideal candidates also have the ability to:

  • Write articles on interesting technical topics (that may or may not relate to Lucene/Solr) on Sematext Blog or elsewhere
  • Create and give technical talks/presentations (at conferences, local user groups, etc.)

Additional personal and professional traits we really like:

  • Proactive and analytical: takes initiative, doesn’t wait to be asked or told what to do and how to do it
  • Self-improving and motivated: acquires new knowledge and skills, reads books, follows relevant projects, keeps up with changes in the industry…
  • Self-managing and organized: knows how to parcel work into digestible tasks, organizes them into Sprints, updates and closes them, keeps team members in the loop…
  • Realistic: good estimator of time and effort (i.e. knows how to multiply by 2)
  • Active in OSS projects: participates in open source community (e.g. mailing list participation, patch contribution…) or at least keeps up with relevant projects via mailing list or some other means
  • Follows good development practices: from code style to code design to architecture
  • Productive, gets stuff done: minimal philosophizing and over-designing

Here are some of the Search things we do (i.e. that you will do if you join us):

  • Work with external clients on their Lucene/Solr projects.  This may involve anything from performance troubleshooting to development of custom components, to designing highly scalable, high performance, fault-tolerant architectures.  See our services page for common requests.
  • Provide Lucene/Solr technical support to our tech support customers
  • Work on search-related products and services

A few words about us:

We work with search and big data (Lucene, Solr, Nutch, Hadoop, MapReduce, HBase, etc.) on a daily basis.  Our projects with external clients range from 1 week to several months.  Some clients are small startups, some are large international organizations.  Some are top secret.  New customers knock on our door regularly and this keeps us busy at pretty much all times.  When we are not busy with clients we work on our products.  We run and  We participate in open-source projects and publish monthly Digest posts that cover Lucene, Solr, Nutch, Mahout, Hadoop, and HBase.  We don’t write huge spec docs, we work in sprints, we multitask, and try our best to be agile. We send people to conferences, trainings (Hadoop, HBase, Cassandra), and certifications (2 of our team members are Cloudera Certified Hadoop Developers).

We are a small and mostly office-free, highly distributed team that communicates via email, Skype voice/IM, BaseCamp.  Some of our developers are in Eastern Europe, so we are especially open to new team members being in that area, but we are also interested in good people world-wide, from South America to Far East.

Interested? Please send your resume to jobs @

Hiring Lucene, Solr, Nutch, Hadoop, NLP People

Hear, hear!

We are looking for people passionate about search/information retrieval, natural language processing, machine learning, text analytics, recommendation engines, and related topics.  Please see the Sematext Jobs page for a bit more information.  If you enjoy working with Lucene, Solr, Nutch, Hadoop, HBase, or any of the other technologies listed on the Sematext Jobs page or, more generally, you enjoy working in any of the related fields, please get in touch.

We are a small, private company based in New York City, with people on multiple continents and clients from all around the globe.

Sematext Blog Introduction

Here it is – Sematext’s new and shinny blog.

We’ll be writing about topics that are dear and important to us – search (both web search and enterprise search), text analytics, natural language processing (sentiment detection, named entity recognition…), machine learning, information gathering (e.g. web crawling), information extraction, e-discovery, recommendation engines, etc.  There will be a lot of talk about tools we use regularly – Lucene, Solr, Nutch, Mahout and Taste, Hadoop, HBase and friends, and more.

To subscribe, use the orange feed icon or just go to  If you are a Twitter user, you can follow @sematext on Twitter, too.


Get every new post delivered to your Inbox.

Join 181 other followers