Hadoop Digest, June 2010

Hadoop 0.21 release is getting close: a few blocking issues remain in Common, HDFS and MapReduce modules.

Big announcement from Cloudera: CDHv3 and Cloudera Enterprise were released. In CDHv3 beta 2 the following was added:

  • HBase: the popular distributed columnar storage system with fast read-write access to data managed by HDFS.
  • Oozie: Yahoo!’s workflow engine. (op.ed. How many MapReduce workflow engines are there out there?  We know of at least 4-5 of them!)
  • Flume: a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows.
  • Hue: a graphical user interface to work with CDH. Hue lets developers build attractive, easy-to-use Hadoop applications by providing a desktop-based user interface SDK.
  • Zookeeper: a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

Cloudera Enterprise combines the open source CDHv3 platform with critical monitoring, management and administrative tools. It also enables control of access to the data and resources by users and groups (can be integrated with Active Directory and other LDAP implementations). The bad news is that it isn’t going to be free.

Community trends & news:

  • Amazon Elastic MapReduce now supports Hadoop 0.20, Hive 0.5, and Pig 0.6. Please, see the announcement.
  • Chukwa is going to move to the Apache’s Incubator to prepare to become a TLP.
  • Using ‘wget’ to download a file from HDFS is explained here.
  • Yahoo’s back port of security into Hadoop 0.20 is available including a sandbox VM.
  • Those of you who missed a great webinar from Cloudera, “Top ten tips tricks for Hadoop success” can get the slides from here.
  • Twitter intends to open-source Crane: MySQL-to-Hadoop tool.
  • Interesting talk from Jeff Hammerbacher about analytical data platforms. Don’t forget to read this nice passage dedicated to it.

Notable efforts:

Follow @sematext on Twitter.

3 Responses to Hadoop Digest, June 2010

  1. Jeff says:

    Hey Alex,

    Thanks as always for the update. To clarify, we released CDH3b2 (the second beta release of the third version of CDH) last week. We also added ZooKeeper to the distribution. Otherwise, you’re spot on!

    Thanks,
    Jeff

  2. Alex Baranau says:

    Hi Jeff,

    Thank you for your comment, I made corrections.

    Alex

  3. sematext says:

    Here are some: Azkaban, pomsets, Oozie, hamake, Cascading….

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,696 other followers

%d bloggers like this: