Slides: Battle of the Giants – Solr 4.0 vs ElasticSearch 0.20.0

Slides for the Battle of the Giants talk Rafał Kuc (@kucrafal) gave at ApacheCon EU 2012 are now up!

If you like working with Solr and/or ElasticSearch, or HBase, Hadoop, Kafka, Flume, etc., use and/or develop highly scalable distributed applications and frameworks, if you like to work on Analytics and Big Data applications and services, we’re looking for good, smart, and fun people!

And if you liked the above presentation, you may also want read our ElasticSearch vs. Solr series and see Scaling Massive ElasticSearch Clusters.

Presentation: Intro to HBase Internals and Schema Design

Below are the slides (and audio) from the Intro to HBase Internals and Schema Design presentation  Alex gave at had our inaugural HBase NYC meetup.   See also: Introduction to HBase

We’re hiring people who want to work WITH and ON HBase and other Big Data technologies.  See jobs @ sematext.

Presentation: Intro to HBase

Last week we had our inaugural HBase NYC meetup.  About 30 people turned up – not bad for the first meetup.  Etsy, Sematext old customer and Brooklyn neighbours, provided the space, AV equipment and help, as well as their fridge with beer – thanks!  Alex gave two talks, first the Introduction to HBase whose slides (audio) are below and then Intro to HBase Internals and Schema Design.

We’re hiring people who want to work WITH and ON HBase and other Big Data technologies.  See jobs @ sematext.

Slides: Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB…

In this presentation from Berlin Buzzwords 2012 we show how the SPM, our Performance Monitoring service is built.  How metrics are collected, how they are processed, and how they are presented.  We share a few findings along the way, too.

Note: we are actively looking for people with strong Java engineers. If that’s you, please get in touch. Separately, if you have interest and/or experience with HBase and/or Analytics, OLAP, and related areas, or  if you are looking to work with ElasticSearch, Solr, and search in general please get in touch, too.

 

See also:

Slides: Real-time Analytics with HBase

Here are slides from another talk we gave at both Berlin Buzzwords and at HBaseCon in San Francisco last month.  In this presentation  Alex describes one approach to real-time analytics with HBase, which we use at Sematext via HBaseHUT.   If you like these slides you will also like HBase Real-time Analytics Rollbacks via Append-based Updates.

Note: we are actively looking for people with strong interest and/or experience with HBase and/or Analytics, OLAP, etc.  If that’s you, please get in touch.

The short version is from Buzzwords, while the version with more slides is from HBaseCon:

Slides: Scaling Massive ElasticSearch Clusters

We are done with a 2-days long Berlin Buzzwords conference.  The conference was good, a success for both the organizers and for Sematext – we saw a ton of interest for both our Performance Monitoring and Search Analytics services and our talks were well received and attended by 200+ people each.  Between the presentations we gave, talking to people interested in our products and/or services, as well as people expressing interest in joining Sematext (ask us how much fun we had in Berlin!), even with 5 Sematextans around we had our hands full.

Note: we are actively looking for people with strong interest and/or experience with ElasticSearch, Solr, and search in general.  If that’s you, please get in touch.

Below are the slides from Rafal’s talk about scaling Elastic Search:

Berlin Buzzwords 2012 – Three Talks from Sematext

Last year was our first time at Berlin Buzzwords.  We gave 1 full talk about Search Analytics (video) and 2 lightning talks (video, video).  We saw a number of good talks, too.  We also took part in a HBase Hackathon organized by Lars George in Groupon’s Berlin offices and even found time to go clubbing.  So in hopes of paying Berlin another visit this year, a few of us at Sematext (@sematext) submitted talk proposals.  Last week we all got acceptance emails, so this year there will be 3 talks from 3 Sematextans at Berlin Buzzwords!  Here is what we’ll be talking about:

RafałScaling Massive ElasticSearch Clusters

This talk describes how we’ve used ElasticSearch to build massive search clusters capable of indexing several thousand documents per second while at the same time serving a few hundred QPS over billions of documents in well under a second.  We’ll talk about building clusters that continuously grow in terms of both indexing and search rates. You will learn about finding cluster nodes that can handle more documents, about managing shard and replica allocation and prevention of unwanted shard rebalancing, about avoiding expensive distributed queries, etc.  We’ll also describe our experience doing performance testing of several ElasticSearch clusters and will share our observations about what settings affect search performance and how much.  In this talk you’ll also learn how to monitor large ElasticSearch clusters, what various metrics mean, and which ones to pay extra attention to.

AlexReal-time Analytics with HBase

HBase can store massive amounts of data and allow random access to it – great. MapReduce jobs can be used to perform data analytics on a large scale – great. MapReduce jobs are batch jobs – not so great if you are after Real-time Analytics. Meet append-only writes approach that allows going real-time where it wasn’t possible before.

In this talk we’ll explain how we implemented “update-less updates” (not a typo!) for HBase using append-only approach. This approach shines in situations where high data volume and velocity make random updates (aka Get+Put) prohibitively expensive.  Apart from making Real-time Analytics possible, we’ll show how the append-only approach to updates makes it possible to perform rollbacks of data changes, and avoid data inconsistency problems caused by tasks in MapReduce jobs that fail after only partially updating data in HBase.  The talk is based on Sematext’s success story of building a highly scalable, general purpose data aggregation framework which was used to build Search Analytics and Performance Monitoring services. Most of the generic code needed for append-only approach described in this talk is implemented in our HBaseHUT open-source project.

OtisLarge Scale ElasticSearch, Solr & HBase Performance Monitoring 

This talk has all the buzzwords covered: big data, search, analytics, realtime, large scale, multi-tenant, SaaS, cloud, performance… and here is why:

In this talk we’ll share the “behind the scenes” details about SPM for HBase, ElasticSearch, and Solr, a large scale, multi-tenant performance monitoring SaaS built on top of Hadoop and HBase running in the cloud.  We will describe all its backend components, from the agent used for performance metrics gathering, to how metrics get sent to SPM in the cloud, how they get aggregated and stored in HBase, how alerting is implemented and how it’s triggered, how we graph performance data, etc.  We’ll also point out the key metrics to watch for each system type.  We’ll go over various pain-points we’ve encountered while building and running SPM, how we’ve dealt with them, and we’ll discuss our plans for SPM in the future.

We hope to see some of you in Berlin.  If these topics are of interest to you, but you won’t be coming to Berlin, feel free to get in touch, leave comments, or ping @sematext.  And if you love working with things our talks are about, we are hiring world-wide!

Sematext Presenting Open Source Search Safari at ESS 2012

We are continuing our “new tradition” of presenting at Enterprise Search Summit (ESS) conferences.  We presented at ESS in 2011 (see http://blog.sematext.com/2011/11/02/search-analytics-at-enterprise-search-summit-fall-2011-presentation/).  This year we’ll be giving a talk titled Open Source Search Safari, in which Otis (@otisg) will present a number of open-source search solutions – Lucene, Solr, ElasticSearch, and Sensei, plus maybe one or two others (suggestions?).  We’ll also be chained to booth 26 where we’ll be showcasing our search-vendor neutral Search Analytics service (service is currently still free, feel free to use it all you want), along with some of our other search-related products.

ESS East will be held May 15-16 in our own New York City.  If you are a past or prospective client or customer of ours, please get in touch if you are interested in attending ESS at a discount.

Sematext Presenting Real-time Analytics at HBaseCon 2012

HBaseCon 2012 is the first-ever HBase-focused conference happening this May in San Francisco.  I’m happy to say that Sematext will be there as both a sponsor (likely) and a presenter (definitely) along Facebook, Cloudera, Adobe, Tumblr (nice to see our clients there!) and others.  Alex will be presenting our work on HBaseHUT, one of  Sematext open-sourced projects that were spun out of our work on Scalable Performance Monitoring (for HBase, Solr, ElasticSearch, Sensei, etc.) and Search Analytics.

HBaseHUT makes real-time analytics with HBase possible today — pull requests welcome!

Search Analytics at Enterprise Search Summit Fall 2011 Presentation

Here is another take on Search Analytics, this one being presented at Enterprise Search Summit Fall 2011 in Washington DC, to an audience coming mainly from the US government agencies, very large enterprises, and large international companies with 10s of thousands of employees world wide.  The audience was good and posed a number of good questions after the talk.  The full slide deck is below as well as in Sematext@Slideshare.

Follow

Get every new post delivered to your Inbox.

Join 1,121 other followers