Category Archives: ODBC

Apache Hive: 5 facts

  1. Hive is a SQL-like layer on top of Hadoop
  2. Use it when you have some sort of structure to your data.
  3. You can use JDBC and ODBC drivers to interface with your traditional systems. However, it’s not high performance.
  4. Originally built by (and still used by) Facebook to bring traditional database concepts into Hadoop in order to perform analytics. Also used by Netflix to run daily summaries.
  5. Pig is sometimes compared to Hive, in that they are both “languages” that are layered on top of Hadoop. However, Pig is more analogous to a procedural language to write applications, while Hive is targeted at traditional DB programmers moving over to Hadoop.


Hive can be used to program MapReduce using a subset of SQL

Hive enables MapReduce to be programmed using something that looks like SQL, instead of a procedural language like Java or Python. This is useful if a team of database, as opposed to application, programmers are called upon to program MapReduce.

Using Hive tables requires defining a schema.

The SQL-like language (called HiveQL) is converted to a MapReduce job.

Hue is a browser based GUI within which you can do Hive work. You type your query and see tabular results. Hue has ODBC drivers, and can export a CSV to Excel.

The Apache page for Hive calls it “a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets.” I’m not sure how the data warehouse piece applies.