Sneak Peek: Hadoop Monitoring comes to SPM

When it comes to Hadoop, they say you’ve got to monitor it and then monitor it some more.  Since our own Performance Monitoring and Search Analytics services run on top of Hadoop, we figured it was time to add Hadoop performance monitoring to SPM.  So here is a sneak peek at SPM for Hadoop.  If you’d like to try it on your Hadoop cluster, we’ll be sending invitations soon and you can get on the private beta list starting today!

In the mean time, here is a small sample of pretty self-explanatory reports from SPM for Hadoop, so you can get a sense of what’s available.  There are, of course, a number of other Hadoop-specific reports included, as well as server reports, filtering, alerting, multi-user support, report sharing, etc. etc.

Please don’t forget to tell us what else would you like us to monitor - select your candidates - and if you like what you see and want a good monitoring tool for your Hadoop cluster, please sign up for private beta now.

Click on any graph to see it in its full size and high quality.

Hadoop NameNode Files

Hadoop NameNode Files

.

Hadoop DataNode Read-Write

Hadoop DataNode Read-Write

.

Hadoop JobTracker MapReduce Runtime

Hadoop JobTracker MapReduce Runtime

.

Hadoop TaskTracker Tasks

Hadoop TaskTracker Tasks

.

What else would you like us to monitor with SPM?  Please select your candidates!

For announcements, promotions, discounts, service status, milk, cookies, and other goodies follow @sematext.

Announcing HBase Refcard

We’re happy to announce the very first HBase Refcard proudly authored by two guys from Sematext.  We hope people will find the HBase Refcard useful in their work with HBase, along with the wonderful Apache HBase Reference Guide.  If you think the refcard is missing some important piece of information that deserves to be included or that it contains superfluous content, please do let us know! (e.g., via comments here)

Data Engineer Position at Sematext International

If you’ve always wanted to work with Hadoop, HBase, Flume, and friends and build massively scalable, high-throughput distributed systems (like our Search Analytics and SPM), we have a Data Engineer position that is all about that!  If you are interested, please send your resume to jobs@sematext.com.

Responsibilities:

  • Versatile architect and developer – design and build large, high performance,scalable data processing systems using Hadoop, HBase, and other big data technologies
  • DevOps fan –  run and tune large data processing production clusters
  • Tool maker – develop ops and management tools 
  • Open source participant – keep up with development in areas of cloud and distributed computing, NoSQL, Big Data, Analytics, etc.

Pluses:

  • solid Math, Statistics, Machine Learning, or Data Mining is not required but is a big plus
  • experience with Analytics, OLAP, Data Warehouse or related technologies is a big plus
  • ability and desire to expand and lead a data engineering team
  • ability to think both business and engineering
  • ability to build products and services based on observed client needs
  • ability to present in public, at meetups, conferences, etc.
  • ability to contribute to blog.sematext.com
  • active participation in open-source communities
  • desire to share knowledge and teach
  • positive attitude, humor, agility, high integrity, and low ego, attention to detail

Location:

  • New York

We’re small and growing.  Our HQ is in Brooklyn, but our team is spread over 4 continents.  If you follow this blog you know we have deep expertise in search and big data analytics and that our team members are conference speakers, book authors, Apache members, open-source contributors, etc.

Relevant pointers:

Hadoop 1.0.0 – Extra Notes

The big Hadoop 1.0.0 release has arrived.  The general notes about releases from the dev team include:

  • security
  • Better support for HBase (append/hsynch/hflush, and security)
  • webhdfs (with full support for security)
  • performance enhanced access to local files for HBase
  • other performance enhancements, bug fixes, and features

You can also find the complete release notes here and see all fixes, improvements and new features included in the release. To save you time, please find below additional information about some of the items that attracted our attention from the Hadoop 1.0.0 release.

Cluster Management Optimizations

HADOOP-7728hadoop-setup-conf.sh should be modified to enable task memory manager
Adds additional options to manage memory usage by MR tasks. In particular, this allows to set max memory usage for map and reduce tasks (separately).

Performance Improvements

HDFS-2246hadoop-setup-conf.sh should be modified to enable task memory manager
This is a short-term solution for the issue HDFS-347 “DFS read performance suboptimal when client co-located on nodes with data” which is quite hot in Hadoop dev community nowadays. NOTE: by default this optimization is switched off (or is it? Update: it is not, see the comments) so some config adjustments are required to benefit from it. And you will definitely want to benefit from it: some reported two times I/O performance improvements. Also highly recommended for HBase users.

HDFS-895Allow hflush/sync to occur in parallel with new writes to the file
Previously if a hflush/sync were in progress, an application could not write data to the HDFS client buffer. Again we stress out this improvement for HBase users as this increases the write throughput of the transaction log in HBase.

MAPREDUCE-2494Make the distributed cache delete entires using LRU priority
When certain threshold was reached and distributed cache was being purged, previous implementation deleted all entries that were not currently being used. With new code more hot data can be left in the cache (the percentage is configurable) and thus decrease cache misses.

New Features

HDFS-2316 - [umbrella] WebHDFS: a complete FileSystem implementation for accessing HDFS over HTTP
Allows accessing HDFS over HTTP (read & write)

MAPREDUCE-3169Create a new MiniMRCluster equivalent which only provides client APIs cross MR1 and MR2
Cleaner MR1 & MR2 compatible API for mini MR cluster to be used in unit-tests.

HADOOP-7710Create a script to setup application in order to create root directories for application such hbase, hcat, hive etc
Similar to hadoop-setup-user script, a hadoop-setup-applications script was added to set up root directories for apps to write to (/hbase, /hive, etc.)

Enjoy Hadoop 1.0.0 and we hope you found this quick summary useful!

@sematext

Search Analytics: Business Value & NoSQL Backend Presentation

Last week involved a few late nights for some of us at Sematext – we were busy readying our Search Analytics and Scalable Performance Monitoring services, as well as putting the final touches on the our Search Analytics: Business Value & NoSQL Backend presentation for Lucene Eurocon in Barcelona.

In the past we’ve given a few other public talks about Search Analytics and you can check them all out via http://blog.sematext.com/tag/analytics/.

Extending Hadoop Metrics

Here at Sematext we really like performance metrics and we like HBase.  We like them so much we’ve created a service for HBase Performance Monitoring (and for Solr, too).  In the process we’ve done some experiments with Hadoop and HBase around performance monitoring and are sharing our experience and some relevant code in this post.

The Hadoop metrics framework is simple to extend and customise. For example, you can very easily write a custom MetricsContext which sends metrics to your own storage solution.

All you need to do is extend the AbstractMetricsContext class and implement

protected void emitRecord(String context, String record, OutputRecord outputrecord)
  throws IOException;

To demonstrate, I wrote HBaseMetricsContext which stores Hadoop metrics in HBase. Since HBase itself uses the Hadoop metrics framework, you can use it to store its own metrics inside itself. Useful? Maybe. This is just an example after all.

If you’d like to try it out, get the source from GitHub. Then build the project using:

mvn package

Put the resulting Jar file in the HBase lib directory.

You will need to create a table with the relevant column families. We assume the column families are a composite of:

columnFamily = contextName + "." + recordName

In the HBase shell create your table:

create 'metrics', 'hbase.master', 'hbase.regionserver'

Edit your hadoop-metrics.properties file to include:

hbase.class=com.sematext.hadoop.metrics.HBaseMetricsContext
hbase.tableName=metrics
hbase.period=10

Restart HBase and it will start inserting to the metrics table every 10 seconds.

The row key of each record is made up of the timestamp and the tags (for disambiguation) like so:

rowKey = bytes(maxlong - timestamp) + bytes(tagName) + bytes(tagValue) + …

Subtracting the timestamp from maxlong ensures the scans get the most recent record first.

Each tag and metric is stored in it’s own column. This gives us a table that looks something like this:

hbase.master hbase.regionserver
cluster_requests hostName hostName flushQueueSize regions
rowKey2 rs1.example.org 0 1
rowKey1 101 master.example.org

For clarity timestamps are not included in the above table, as each cell is timestamped. All cells for a record will have the same timestamp.

Wanted: Devops to run Search-Hadoop.com and Search-Lucene.com

If you are dreaming about working on search, big data, analytics, data mining, and machine learning, and are a positive, proactive, independent devops creature, inquire within!

We are a small and highly distributed team who likes to eat a little bit of everything: search for breakfast, mapreduce for lunch, and bigtable for dinner.  We are looking for a part-time-to-grow-into-full-time devops to work on the popular search-hadoop.com and search-lucene.com sites and take them to the next level. As such, you’ll need to be on top of Lucene, Solr, and Elastic Search.  Similarly, you must be completely at $HOME on the UNIX command line.  Working knowledge of Mahout or statistics/machine learning/data mining background would be a major plus, but is not required.  Experience with productive web frameworks and slick modern front-end frameworks is another plus, as is familiarity with EC2 and EBS.

More about the ideal you:

  • You are well organized, disciplined, and efficient
  • You don’t wait to be told what to do and don’t need hand-holding
  • You are reliable, friendly, have a positive attitude, and don’t act like a prima donna
  • You have an eye for detail – no sloppy code, no poor spelelling and typous
  • You are able to communicate complex ideas in a clear fashion in English (or pretty diagrams)
  • You have experience with (large scale) search or data analysis
  • You like to write about technologies relevant to what we do
  • You are an open-source software contributor

Not all of the above are required, of course – the closer the match, the higher the relevance score, that’s all.

Interested?  Please get in touch.

Google Summer of Code and Intern Sponsoring

Are you a student and looking to do some fun and rewarding coding this summer? Then join us for the 2011 Google Summer of Code!

The application deadline is in less than a month! Lucene has identified initial potential projects, but this doesn’t mean you can also pick your own.  If you need additional ideas, look at our Lucene / Solr for Academia: PhD Thesis Ideas (or just the spreadsheet if you don’t want to read the what and the why),  just be sure to discuss with the community first (send an email to dev@lucene.apache.org).

We should also add that, separately from GSoC, Sematext would be happy to sponsor good students and interns interested in work on projects involving search (Lucene, Solr), machine learning & analytics (Mahout), big data (Hadoop, HBase, Hive, Pig, Cassandra), and related areas. We are a virtual and geographically distributed organization whose members are spread over several countries and continents and we welcome students from all across the globe.  For more information please inquire within.

HBase Backup Options

If you are thinking about using HBase you will likely want to understand HBase backup options.  I know we did, so let us share what we found.  Please let us know what we missed and what you use for HBase backup!

Export

You could export your tables using the Export (org.apache.hadoop.hbase.mapreduce.Export) MapReduce job that will export the table data into a Sequence File on HDFS.  This was implemented in HBASE-1684 if you want to check out the patch or comments there.  This tool works on one table at a time, so if you need to backup multiple tables, run this on each table.  The exported data can then be imported back into HBase by the Import tool.

Copy Table

If you have another HBase cluster that you want to treat as a backup cluster, you can use the handy CopyTable tool to copy a table at a time.

Distcp

You could use Hadoop’s distcp command to copy the whole /hbase directory from one HDFS cluster to the other.  However, this can leave your data in an inconsistent state, so it should be avoided.  See http://search-hadoop.com/m/wkMgSjVLDb

At this point we should point out that all of the above backup methods are per-table.  Moreover, they don’t work or create a snapshot of the table.   Export and CopyTable are atomic only at the row level.  Furthermore, if you have multiple tables whose tables depend on each other, if they are being modified while you are exporting or copying them, you will end up with inconsistent data – the data in those tables will not be in sync.  See http://search-hadoop.com/m/Q4bU81G116p.

Backup from Mozilla

Because of the above mentioned issues with distcp when running it over a cluster whose data is being modified while distcp is running, developers at Mozilla came up with their own Backup tool.  They’ve described the tool and its use in the popular Migrating HBase in the Trenches post.

Cluster Replication

HBase has a relatively new and not yet widely used whole cluster replication mechanism.  The backup cluster does not have to be identical to the master cluster, which means that the backup cluster could be much less powerful and thus cheaper, while still having enough storage to serve as backup.

Table Snapshot

Ah, the infamous HBASE-50!  This issue saw some great work during GSoC 2010, but it looks like it was never integrated into HBase.  It is unclear whether the contributor simply ran out of steam or time or whether it became apparent that table snapshots are too difficult to implement or simply not doable because of highly distributed nature of HBase.  The JIRA issue does contain patches you can look at, and the author has a now inactive hbase-snapshot repository up on Github.

HDFS Replication

You could also simply crank up the replication factor to the level that makes you feel safe and call that a backup.  This may not guard against data corruption, but it does guard against certain partial hardware failures.

Since so many people seem to be asking about HBase backup options, I hope this serves as a good point-in-time snapshot, a summary of all HBase backup options that are currently on the table.  With time, this will be added to the HBase Book.

Are there other HBase backup options we should have included?

What you use for HBase backup?

Deferring Processing Updates to Increase HBase Write Performance

In this post we’ll examine, via an example, how deferring the processing of updates can increase write throughput of a data store. In the end we’ll introduce our recently open-sourced HBaseHUT, a tool for automating of deferred updates in HBase.

Please note, that “deferred updates” here does not mean that updates are not “visible” immediately after a write operation. The system still serves add/update/delete and read operations in real-time.

Plug: if this sort of stuff interests you, we are hiring people who know about Hadoop, HBase, MapReduce…

Constraints & Assumptions

The decisions and ideas in the post are based on facts about HBase, but can be applied to many other data stores with similar characteristics:

  • Fetching a set of records as a range (scan operation) is much cheaper than fetching the same records one-by-one
  • Adding new records (simple write operation) is very fast
  • Storage space is cheap
  • “In-place” updates of records are not generally feasible

Well, the last assumption is a bit tricky: HBase provides incrementColumnValue() operation but it is very limited: one can only increment a specified cell that contains long value by some delta. In future versions HBase will allow incrementing of multiple cells at once (already in HBase trunk). However, not all update operations are that simple, as we’ll see in the example. Moreover, incrementing cells in a record is slower than writing a new record (we’ll use this fact later in the post).

Idea Explained

The idea behind deferred updates processing is to postpone updating of the existing record and store incoming deltas as a new record. Thus, record update operations become a simple write operations with corresponding performance. Deferred updates technique elaborated here fits well when system handles a lot of updates of stored data and write performance is the main concern, while reading speed requirements are not that strict. The following cases (each of them separately or any combination of them) may indicate  that one can benefit from using the technique (the list is not complete):

  • updates are well spread over the whole and large dataset
  • lower rate (among write operations) of “true updates” (i.e. low percentage of writes are for completely new data, not really updates of existing data)
  • good portion of data stored/updated may never be accessed
  • system should be able to handle high write peaks without major performance degradation

Next, let’s look at an example use-case to illustrate the idea better.

Example Use-Case

Imagine a simple presentation slides sharing system that gives one the ability to browse presentations on-line (not as downloadable PDF or whatever format files, since that would not be interesting for us here). Suppose we want to provide the presentation’s author with details on user behaviour when a user navigates through slides so that the author can use this data to infer the quality of the presentation (hey, slide sharing systems maintainers: free hint!). Let’s limit statistics data we want to share to simple metrics for the sake of keeping the example simple. Here’s what we can provide:

  • number of times a particular slide was viewed
  • avg/min/max time spent by user on a particular slide

Using these stats the presentation author can find out e.g.:

  • what are the most difficult to understand slides (they probably need to be divided into multiple ones, or better preparation before showing them should be done by giving more/better info on previous slides)
  • what are the most trivial slides (they can be merged/removed)
  • what are the typical users (e.g., If users spend a lot of time thinking on the slides the author can add more details in the presentation.  On the other hand, if users usually quickly walk through it then some very deep details can be removed)

Such a system could provide more stats to authors so that they can work on other sorts of presentation improvements.  For example, the system could provide information about user navigation actions (e.g., if users often go back and forth between two slides then probably the schemes/diagrams on them are better understandable if they are on the same slide and can be easily compared), but that’s not the focus of this post.

To track previously mentioned stats data we need to capture how much time a user spent on a particular slide and send this data to our data store. Let’s assume our records store the following info: {(key)presentationId_slideId, number of views, total viewing time, max viewing time, min viewing time}. Using this data we can provide useful information to author as needed.

Every time user leaves the slide, tracking logic sends the next data to the data store: {(key)presentationId_slideId, time spent on viewing}.

Direct Solution

The straight forward approach would be to update the presentationId_slideId record (fetch, modify and store updated data) as new data comes in. Here are some drawbacks of this naive approach:

  • for each update operation we have to perform fetch and write operations (Get and Put in case of HBase)
  • Most authors won’t really care about all these stats and majority of them will never look at it, so many processing operations can simply be omitted (until there’s interest in stats)
  • for first ever viewing of a slide a redundant fetch operation is performed (data is not there yet), which is OK if slides are viewed many times but in situation when many presentations are viewed just one time (e.g. by their creators) this may result in a bigger portion of redundant operations (this assumption is sort of made up, yes)

The very first point is a major one and is the obvious write performance killer.  This point is really not made up – we have first hand experience with that particular situation.

Deferring Updates Processing

One alternative solution is to defer updates processing to the time when data is requested by presentation author or to the time when the scheduled/periodic updates processing is performed (ideally when data store write load is minimal). In this deferred updates approach all new deltas (new user action data) are written as new records.

When an author requests the stats, efficient updates processing takes place. This time it is done in much more efficient way (compared to direct approach): by using range scan operations, and not by individual record read/writes. Given that presentation has usually tens of slides and is viewed by hundreds or thousands people such range scans are so fast the user will barely notice them. Also, it should be quite OK to ask a person to wait several seconds during his first view of stats page (processed updates are written back into the data store, so next time user fetches data it is available right away).

To prevent too long updates processing on user request the data can be processed periodically instead of at request time. Given the fact that (updated) statistics data is available to user immediately, the processing can be postponed for a long time without affecting the “real-time” visibility aspect of the system. The processing of updates can be scheduled to take place during off-peak hours, when they won’t compromise write performance. By doing this we add a “cushion” since from this point on the maximum amount of data that needs to be processed “on-the-fly” is one day ‘s worth of updates (assuming daily updates processing) , which is sufficient for the majority of cases (e.g. estimated max presentation views per day is tens of thousands which is a very quick scan operation).

The remainder of this post provides details of our implementation of this approach.

HBaseHUT

HBaseHUT is a tool that automates deferred processing of updates for HBase. It hides many details of updates processing from the client code, thus making it an easy to use solution. There are Put and ResultScanner wrappers that encapsulate all needed logic.

To write new data you need to use HutPut implementation of Put. Since it implements the standard HBase Put only the instance creation code needs to be changed:

public class SlideStatsTracker {
  byte[] SLIDE_CF = Bytes.toBytes("slide");
  byte[] VIEWS_NUM_C = Bytes.toBytes("views");
  byte[] TOTAL_VTIME_C = Bytes.toBytes("total_time");
  byte[] MIN_VTIME_C = Bytes.toBytes("min_time");
  byte[] MAX_VTIME_C = Bytes.toBytes("max_time");
  ...
  void trackSlideView(byte[] presentationId, byte[] slideId, long timeSpent)
                               throws InterruptedException, IOException {
    Put put = new HutPut(Bytes.add(presentationId , slideId));
    put.add(SLIDE_CF, VIEWS_NUM_C, Bytes.toBytes(1));
    put.add(SLIDE_CF, TOTAL_VTIME_C, Bytes.toBytes(timeSpent));
    hTable.put(put);
  }
  …
}

Fetching stats is performed using HutResultScanner which implements ResultScanner and hence can also be very easily integrated into the code that uses normal HBase API:

  Scan scan = new Scan(presentationId);
  ResultScanner resultScanner =
      new HutResultScanner(hTable.getScanner(scan), updateProcessor);
  Result result = resultScanner.next();
  while (result != null) {
    // fetch data from result object
    result = resultScanner.next();
  }

The updateProcessor passed as parameter encapsulates update operation logic:

  class SlideStatsUpdateProcessor implements UpdateProcessor {
    @Override
    public void process(Iterable records, UpdateProcessingResult processingResult) {
      int viewsNumber = 0;
      long totalViewTime = 0;
      long minViewTime = Long.MAX_VALUE;
      long maxViewTime = Long.MIN_VALUE;
      // Processing records
      for (Result record : records) {
        for (int i = 0; i < 5; i++) {
          int views = Bytes.toInt(record.getValue(SLIDE_CF, VIEWS_NUM_C));
          long totalTime = Bytes.toLong(record.getValue(SLIDE_CF, TOTAL_VTIME_C));
          long minTime = Bytes.toLong(record.getValue(SLIDE_CF, MIN_VTIME_C));
          long maxTime = Bytes.toLong(record.getValue(SLIDE_CF, MAX_VTIME_C));

          viewsNumber += views;
          totalViewTime += totalTime;
          minViewTime = minTime  maxViewTime ? maxTime : maxViewTime;
        }
      }

      processingResult.add(SLIDE_CF, VIEWS_NUM_C, Bytes.toBytes(viewsNumber));
      processingResult.add(SLIDE_CF, TOTAL_VTIME_C, Bytes.toBytes(totalViewTime));
      processingResult.add(SLIDE_CF, MIN_VTIME_C, Bytes.toBytes(minViewTime));
      processingResult.add(SLIDE_CF, MAX_VTIME_C, Bytes.toBytes(maxViewTime));
    }
  }

As you one can see, HBaseHUT abstraction doesn’t change HBase API and that makes it very easy to use in new code or integrate it in old code.

You can find more details about HBaseHUT features on the project’s wiki. HBaseHUT is a new, recently emerged project – your feedback/questions/ideas/feature requests/forks/pull requests, etc. are all very welcome. Please start new discussions on HBaseHUT’s mailing list.

If you read this far, we’d like to talk to you – we are hiring smart people who know about Hadoop, HBase, MapReduce, want to do large scale data (stream) processing, and build tools like HBaseHUT.

Follow

Get every new post delivered to your Inbox.

Join 1,121 other followers