Hadoop Digest, June 2010
July 6, 2010 3 Comments
Big announcement from Cloudera: CDHv3 and Cloudera Enterprise were released. In CDHv3 beta 2 the following was added:
- HBase: the popular distributed columnar storage system with fast read-write access to data managed by HDFS.
- Oozie: Yahoo!’s workflow engine. (op.ed. How many MapReduce workflow engines are there out there? We know of at least 4-5 of them!)
- Flume: a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows.
- Hue: a graphical user interface to work with CDH. Hue lets developers build attractive, easy-to-use Hadoop applications by providing a desktop-based user interface SDK.
- Zookeeper: a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Cloudera Enterprise combines the open source CDHv3 platform with critical monitoring, management and administrative tools. It also enables control of access to the data and resources by users and groups (can be integrated with Active Directory and other LDAP implementations). The bad news is that it isn’t going to be free.
Community trends & news:
- Amazon Elastic MapReduce now supports Hadoop 0.20, Hive 0.5, and Pig 0.6. Please, see the announcement.
- Chukwa is going to move to the Apache’s Incubator to prepare to become a TLP.
- Using ‘wget’ to download a file from HDFS is explained here.
- Yahoo’s back port of security into Hadoop 0.20 is available including a sandbox VM.
- Those of you who missed a great webinar from Cloudera, “Top ten tips tricks for Hadoop success” can get the slides from here.
- Twitter intends to open-source Crane: MySQL-to-Hadoop tool.
- Interesting talk from Jeff Hammerbacher about analytical data platforms. Don’t forget to read this nice passage dedicated to it.
- Tools and Recipes for Monitoring Apache Zookeeper.
Follow @sematext on Twitter.