Data Engineer Position at Sematext International

If you’ve always wanted to work with Hadoop, HBase, Flume, and friends and build massively scalable, high-throughput distributed systems (like our Search Analytics and SPM), we have a Data Engineer position that is all about that!  If you are interested, please send your resume to jobs@sematext.com.

Responsibilities:

  • Versatile architect and developer – design and build large, high performance,scalable data processing systems using Hadoop, HBase, and other big data technologies
  • DevOps fan –  run and tune large data processing production clusters
  • Tool maker – develop ops and management tools 
  • Open source participant – keep up with development in areas of cloud and distributed computing, NoSQL, Big Data, Analytics, etc.

Pluses:

  • solid Math, Statistics, Machine Learning, or Data Mining is not required but is a big plus
  • experience with Analytics, OLAP, Data Warehouse or related technologies is a big plus
  • ability and desire to expand and lead a data engineering team
  • ability to think both business and engineering
  • ability to build products and services based on observed client needs
  • ability to present in public, at meetups, conferences, etc.
  • ability to contribute to blog.sematext.com
  • active participation in open-source communities
  • desire to share knowledge and teach
  • positive attitude, humor, agility, high integrity, and low ego, attention to detail

Location:

  • New York

We’re small and growing.  Our HQ is in Brooklyn, but our team is spread over 4 continents.  If you follow this blog you know we have deep expertise in search and big data analytics and that our team members are conference speakers, book authors, Apache members, open-source contributors, etc.

Relevant pointers:

Wanted Dead or Alive: Search Engineer with Client-facing Skills

We are on a lookout for a strong Search Engineer with interest and ability to interact with clients and with potential to build and lead local and/or remote development teams.  By “client-facing” we really mean primarily email, phone, Skype.

A person in this role needs to be able to:

  • design large scale search systems
  • have solid knowledge of either Solr or ElasticSearch or both
  • efficiently troubleshoot performance, relevance, and other search-related issues
  • speak and interact with clients

Pluses – beyond pure engineering:

  • ability and desire to expand and lead a development/consulting teams
  • ability to think both business and engineering
  • ability to build products and services based on observed client needs
  • ability to present in public, at meetups, conferences, etc
  • ability to contribute to blog.sematext.com
  • active participation in online search communities
  • attention to detail
  • desire to share knowledge and teach
  • positive attitude, humor, agility

We’re small and growing.  Our HQ is in Brooklyn, but our team is spread over 4 continents.  If you follow this blog you know we have deep expertise in search and big data analytics and that our team members are conference speakers, book authors, Apache members, open-source contributors, etc.  While we are truly international, this particular opening is in New York.  Speaking of New York, some of our New York City clients that we are allowed to mention are Etsy, Gilt, Tumblr, Thomson Reuters, Simon & Schuster (more on http://sematext.com/clients/index.html).

Relevant pointers:

If you are interested, please send over some information about yourself, your CV, and let’s talk.

Hiring: Data Mining, Analytics, Machine Learning Hackers

If you want to work with search, big data mining, analytics, and machine learning, and you are a positive, proactive, independent creature, please keep reading.We are looking for devops to hack on Sematext’s new products and services, as well as provide services to our growing list of clients.  Working knowledge of Mahout or statistics/machine learning/data mining background would be a major plus.
 

Skills & experience (the more of these you have under your belt the better):

  • Data mining and/or machine learning (Mahout or …)
  • Big data (HBase or Cassandra or Hive or …)
  • Search (Solr or Lucene or Elastic Search or …)

More about an ideal you:

  • You are well organized, disciplined, and efficient
  • You don’t wait to be told what to do and don’t need hand-holding
  • You are reliable, friendly, have a positive attitude, and don’t act like a prima donna
  • You have an eye for detail, don’t like sloppy code, poor spelelling and typous
  • You are able to communicate complex ideas in a clear fashion in English, clean and well designed code, or pretty diagrams

Optional bonus points:

  • You like to write or speak publicly about technologies relevant to what we do
  • You are an open-source software contributor

A few words about us:

We work with search and big data (Lucene, Solr, Nutch, Hadoop, MapReduce, HBase, etc.) on a daily basis and we present at conferences.  Our projects with external clients range from 1 week to several months.  Some clients are small startups, some are large international organizations.  Some are top secret.  New customers knock on our door regularly and this keeps us busy at pretty much all times.  When we are not busy with clients we work on our products.  We run search-lucene.com and search-hadoop.com.  We participate in open-source projects and publish monthly Digest posts that cover Lucene, Solr, Nutch, Mahout, Hadoop, Hive, and HBase.  We don’t write huge spec docs, we work in sprints, we multitask, and try our best to be agile. We send people to conferences, trainings (Hadoop, HBase, Cassandra), and certifications (2 of our team members are Cloudera Certified Hadoop Developers).

We are a small and mostly office-free, highly distributed team spanning 3 continents and 6 countries.  We communicates via email, Skype voice/IM, BaseCamp.  Some of our developers are in Eastern Europe, so we are especially open to new team members being in that area, but we are also interested in good people world-wide, from South America to Far East.

Interested? Please send your resume to jobs @ sematext.com feel free to check out our other positions.

Wanted: Devops to run Search-Hadoop.com and Search-Lucene.com

If you are dreaming about working on search, big data, analytics, data mining, and machine learning, and are a positive, proactive, independent devops creature, inquire within!

We are a small and highly distributed team who likes to eat a little bit of everything: search for breakfast, mapreduce for lunch, and bigtable for dinner.  We are looking for a part-time-to-grow-into-full-time devops to work on the popular search-hadoop.com and search-lucene.com sites and take them to the next level. As such, you’ll need to be on top of Lucene, Solr, and Elastic Search.  Similarly, you must be completely at $HOME on the UNIX command line.  Working knowledge of Mahout or statistics/machine learning/data mining background would be a major plus, but is not required.  Experience with productive web frameworks and slick modern front-end frameworks is another plus, as is familiarity with EC2 and EBS.

More about the ideal you:

  • You are well organized, disciplined, and efficient
  • You don’t wait to be told what to do and don’t need hand-holding
  • You are reliable, friendly, have a positive attitude, and don’t act like a prima donna
  • You have an eye for detail – no sloppy code, no poor spelelling and typous
  • You are able to communicate complex ideas in a clear fashion in English (or pretty diagrams)
  • You have experience with (large scale) search or data analysis
  • You like to write about technologies relevant to what we do
  • You are an open-source software contributor

Not all of the above are required, of course – the closer the match, the higher the relevance score, that’s all.

Interested?  Please get in touch.

Hiring Search and Data Analytics Engineers

We are growing and looking for smart people to join us either in an “elastic”, on-demand, per-project, or more permanent role:

Lucene/Solr expert who…

  • Has built non-trivial applications with Lucene or Solr or Elastic Search, knows how to tune them, and can design systems for large volume of data and queries
  • Is familiar with (some of the) internals of Lucene or Solr or Elastic Search, at least on the high level (yeah, a bit of an oxymoron)
  • Has a systems/ops bent or knows how to use performance-related UNIX and JVM tools for analyzing disk IO, CPU, GC, etc.

Data Analytics expert who…

  • Has used or built tools to process and analyze large volumes of data
  • Has experience using HDFS and MapReduce, and have ideally also worked with HBase, or Pig, or Hive, or Cassandra, or Voldemort, or Cascading or…
  • Has experience using Mahout or other similar tools
  • Has interest or background in Statistics, or Machine Learning, or Data Mining, or Text Analytics or…
  • Has interest in growing into a Lead role for the Data Analytics team

We like to dream that we can find a person who gets both Search and Data Analytics, and ideally wants or knows how to marry them.

Ideal candidates also have the ability to:

  • Write articles on interesting technical topics (that may or may not relate to Lucene/Solr) on Sematext Blog or elsewhere
  • Create and give technical talks/presentations (at conferences, local user groups, etc.)

Additional personal and professional traits we really like:

  • Proactive and analytical: takes initiative, doesn’t wait to be asked or told what to do and how to do it
  • Self-improving and motivated: acquires new knowledge and skills, reads books, follows relevant projects, keeps up with changes in the industry…
  • Self-managing and organized: knows how to parcel work into digestible tasks, organizes them into Sprints, updates and closes them, keeps team members in the loop…
  • Realistic: good estimator of time and effort (i.e. knows how to multiply by 2)
  • Active in OSS projects: participates in open source community (e.g. mailing list participation, patch contribution…) or at least keeps up with relevant projects via mailing list or some other means
  • Follows good development practices: from code style to code design to architecture
  • Productive, gets stuff done: minimal philosophizing and over-designing

Here are some of the Search things we do (i.e. that you will do if you join us):

  • Work with external clients on their Lucene/Solr projects.  This may involve anything from performance troubleshooting to development of custom components, to designing highly scalable, high performance, fault-tolerant architectures.  See our services page for common requests.
  • Provide Lucene/Solr technical support to our tech support customers
  • Work on search-related products and services

A few words about us:

We work with search and big data (Lucene, Solr, Nutch, Hadoop, MapReduce, HBase, etc.) on a daily basis.  Our projects with external clients range from 1 week to several months.  Some clients are small startups, some are large international organizations.  Some are top secret.  New customers knock on our door regularly and this keeps us busy at pretty much all times.  When we are not busy with clients we work on our products.  We run search-lucene.com and search-hadoop.com.  We participate in open-source projects and publish monthly Digest posts that cover Lucene, Solr, Nutch, Mahout, Hadoop, and HBase.  We don’t write huge spec docs, we work in sprints, we multitask, and try our best to be agile. We send people to conferences, trainings (Hadoop, HBase, Cassandra), and certifications (2 of our team members are Cloudera Certified Hadoop Developers).

We are a small and mostly office-free, highly distributed team that communicates via email, Skype voice/IM, BaseCamp.  Some of our developers are in Eastern Europe, so we are especially open to new team members being in that area, but we are also interested in good people world-wide, from South America to Far East.

Interested? Please send your resume to jobs @ sematext.com.

Hiring Lucene, Solr, Nutch, Hadoop, NLP People

Hear, hear!

We are looking for people passionate about search/information retrieval, natural language processing, machine learning, text analytics, recommendation engines, and related topics.  Please see the Sematext Jobs page for a bit more information.  If you enjoy working with Lucene, Solr, Nutch, Hadoop, HBase, or any of the other technologies listed on the Sematext Jobs page or, more generally, you enjoy working in any of the related fields, please get in touch.

We are a small, private company based in New York City, with people on multiple continents and clients from all around the globe.

Follow

Get every new post delivered to your Inbox.

Join 1,713 other followers