Category Archives: HCatalog

I spent some time today using the Hortonworks Hadoop sandbox

I downloaded the Hortonworks sandbox today. I’m using the version that runs as a virtual machine under Oracle VirtualBox. The sandbox can run in as little as 2GB RAM, but requires 4GB in order to enable Ambari and HBase. Good thing that I have 8GB in my laptop.

The “Hello World” tutorial provided me with hands on:

  • Uploading a file into HCatalog
  • Typing queries into Beeswax, which is a GUI into Hive
  • Running a more complex query by writing a short script in Pig

There are a lot more tutorials. I’ll update this blog post after I finish each tutorial.

Sources:

Advertisements

Apache Ambari: A suite of applications/components to provision, manage, and monitor Hadoop clusters

System Admins:

Provision

  • Wizard for installing/configuring Hadoop services across many hosts

Manage

  • Start, stop, reconfigure Hadoop across many hosts

Monitor

  • Dashboard for health & status
  • Metrics via Ganglia (Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids)
  • Alerting via Nagios

Developers:

  • Integrate provisioning, mangement, and monitoring into their own application using the Ambari REST APIs

These tools are supported by Ambari:

  • HDFS, MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop

Sources: