Category Archives: HiveQL

Hortonworks Sandbox Hive tutorial

I much preferred this tutorial in Hive, rather than the previous one using Pig. Using the same dataset in each example made the comparison clearer.

Pig makes sense for sequential steps, such as an ETL job. Hive seemed better suited for tasks comparable to ones in which we’d write stored procedures within a more traditional database server.

Another difference came with debugging.

  • The Pig editor bundled into the Hortonworks sandbox isn’t very sophisticated as IDEs go. No breakpoints, viewing of data, etc. Perhaps there’s a way to accomplish this, but (thankfully) it isn’t covered in such an early stage tutorial. There’s a button to upload a UDF jar, so I’ve got to research how one develops that jar outside of the Pig script editor.
  • The Hive tutorial makes it easier to view progress at each step, since you can think of each step as an independent SQL (actually HiveQL) statement. If the programming task were far more complex, I could see myself structuring the Pig scripts in a way that might be easier to debug than Hive.
  • Hive seemed good for an ad-hoc query and Pig for a complex procedural task.
  • The next tutorial combines Pig and Hive. I’ll see how that shapes my perceptions.

Hive can be used to program MapReduce using a subset of SQL

Hive enables MapReduce to be programmed using something that looks like SQL, instead of a procedural language like Java or Python. This is useful if a team of database, as opposed to application, programmers are called upon to program MapReduce.

Using Hive tables requires defining a schema.

The SQL-like language (called HiveQL) is converted to a MapReduce job.

Hue is a browser based GUI within which you can do Hive work. You type your query and see tabular results. Hue has ODBC drivers, and can export a CSV to Excel.

The Apache page for Hive calls it “a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets.” I’m not sure how the data warehouse piece applies.