HBase versus Hypertable

There was recently some effort to compare the performance of Hypertable against HBase on the Hypertable site, and some possible takeaways from HBase committers (DISCLAIMER: I am an HBase committer).

NOTE: Luke Lu pointed out in the comments that the benchmark was unfair to Hypertable because it uses the Thrift API, which is much slower than the C++ native API. So please interpret these results as such. I will try to do a better apples-to-apples comparison at some point.

Those of us who eat and breathe HBase at Facebook were curious to see how our internal version of HBase would fare, and thought to benchmark Hypertable against the Facebook version of HBase. So the first thing we did was to setup a YCSB workload to benchmark Hypertable and HBase. See this post for all the details, but the brief takeaway was that HBase was 1.4x better than Hypertable on writes, but Hypertable was 2-3x better on reads. So we setup a one-node, simple read test consisting 1M rows, 10 columns per row, 100 bytes per column (1GB total data) to measure this, and improve the reads from HBase. The results are shown below:

Hypertable versus HBase read performance

Obviously we wanted to see how to improve HBase read performance. So over the summer of 2012, we did a series of experiments to improve HBase read performance – see this post for details. The summary of the post is that we were able to improve HBase reads quite a bit, and at the end it was able to perform better than Hypertable as shown in the chart below.

Hypertable versus HBase read performance

Many people at Facebook have been involved in this effort, and overall this is definitely an awesome improvement to HBase performance!

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>