Hive Digest, March 2011
March 30, 2011 Leave a comment
Welcome to the first Hive digest!
Hive is a data warehouse built on Hadoop, initially developed by Facebook, it’s been under the Apache umbrella for about 2 years and has seen very active development. Last year there were 2 major releases which introduced loads of features and bug fixes. Now Hive 0.7.0 has just been released and is packed with goodness.
Hive 0.6.0 was released October last year. Some of its most interesting features included
- Better skew joins.
- Views were added.
- Database/schema support was added to Hive QL.
- Integration with HBase was added. Allowing to read HBase tables via Hive and bulk load Hive tables into HBase.
- There were multiple improvements making it easier to work with partitions, including multi partition inserts and archiving of partitions.
Hive 0.7.0 has just been released! Some of the major features include:
- Indexing has been implemented, index types are currently limited to compact indexes. This feature opens up lots of potential for future improvements, such as HIVE-1694 which aims to use indexes to accelerate query execution for GROUP BY, ORDER BY, JOINS and other misc cases and HIVE-1803 which will implement bitmap indexing.
- Security features have been added with authorisation and authentication.
- There is now an optional concurrency model which makes use of Zookeeper, so tables can now be locked during writes. It is disabled by default, but can be enabled using hive.support.concurrency=true in the config.
And many other small improvements including:
- Making databases more useful, you can now select across a database.
- The Hive command line interface has gotten some love and now supports auto-complete.
- There’s now support for HAVING clauses, so users no longer have to do nested queries in order to apply a filter on group by expressions.
and much more.