HBase Digest, June 2010
July 1, 2010 3 Comments
HBase 0.20.5 is out! It fixes 24 issues since the 0.20.4 release. HBase developers “recommend that all users, particularly those running 0.20.4, upgrade to this release”.
Community trends:
- There’s a clear need in “sanity check DNS across my cluster” tool as a lot of questions/help requests related to the name/address resolution in the cluster are submitted over time. Any volunteers?
- Bulk incremental load into an existing table feature (HBASE-1923) is commited to trunk. No multi-family support still.
- Good number of advice about increasing the write performance/speed in this thread, including shared numbers/techniques from a large production cluster.
- A set of ORM tools to consider for HBase are suggested here.
Notable efforts:
- HAvroBase: a searchable, evolvable entity store on top of HBase and Solr
- Transactional and indexing extensions for HBase.
FAQ:
- Common issue: tables/data disappears after system restart. Usually people face it when playing with HBase for the first time and even on the single node set-up. The problem is that by default HDFS is configured to store its data in the /tmp dir which might get cleaned up by OS. Configure “dfs.name.dir” and “dfs.data.dir” properties in hdfs-site.xml to aviod these problems.





May I respectfully plug our HBase RowLog library – a library to build WALs and MQs on top HBase, which we released this month as well?
Yes, it’s relevant. I have it in one of my browser tabs, waiting to be read (big page that requires lots of reading means I keep postponing reading it).
Thanks. We have great difficulty striking the balance between terseness and being informative, I agree.