Hadoop 1.0.0 – Extra Notes
January 4, 2012 2 Comments
The big Hadoop 1.0.0 release has arrived. The general notes about releases from the dev team include:
- Better support for HBase (append/hsynch/hflush, and security)
- webhdfs (with full support for security)
- performance enhanced access to local files for HBase
- other performance enhancements, bug fixes, and features
You can also find the complete release notes here and see all fixes, improvements and new features included in the release. To save you time, please find below additional information about some of the items that attracted our attention from the Hadoop 1.0.0 release.
Cluster Management Optimizations
HADOOP-7728 – hadoop-setup-conf.sh should be modified to enable task memory manager
Adds additional options to manage memory usage by MR tasks. In particular, this allows to set max memory usage for map and reduce tasks (separately).
HDFS-2246 – hadoop-setup-conf.sh should be modified to enable task memory manager
This is a short-term solution for the issue HDFS-347 “DFS read performance suboptimal when client co-located on nodes with data” which is quite hot in Hadoop dev community nowadays. NOTE: by default this optimization is switched off (or is it? Update: it is not, see the comments) so some config adjustments are required to benefit from it. And you will definitely want to benefit from it: some reported two times I/O performance improvements. Also highly recommended for HBase users.
HDFS-895 – Allow hflush/sync to occur in parallel with new writes to the file
Previously if a hflush/sync were in progress, an application could not write data to the HDFS client buffer. Again we stress out this improvement for HBase users as this increases the write throughput of the transaction log in HBase.
MAPREDUCE-2494 – Make the distributed cache delete entires using LRU priority
When certain threshold was reached and distributed cache was being purged, previous implementation deleted all entries that were not currently being used. With new code more hot data can be left in the cache (the percentage is configurable) and thus decrease cache misses.
HDFS-2316 - [umbrella] WebHDFS: a complete FileSystem implementation for accessing HDFS over HTTP
Allows accessing HDFS over HTTP (read & write)
MAPREDUCE-3169 – Create a new MiniMRCluster equivalent which only provides client APIs cross MR1 and MR2
Cleaner MR1 & MR2 compatible API for mini MR cluster to be used in unit-tests.
HADOOP-7710 – Create a script to setup application in order to create root directories for application such hbase, hcat, hive etc
Similar to hadoop-setup-user script, a hadoop-setup-applications script was added to set up root directories for apps to write to (/hbase, /hive, etc.)
Enjoy Hadoop 1.0.0 and we hope you found this quick summary useful!