hadoop.apache.org | Cloud (IaaS) & Big Data

Tag Archives: hadoop.apache.org

Apache Hadoop and its components

Posted on October 29, 2013 | Comments Off

Hadoop consists of two components

MapReduce –
- programming framework
- Map
  - distributes work to different Hadoop nodes
- Reduce
  - gathers results from multiple nodes and resolves them into a single value
  - the source come from HDFS, and the output is typically written back to HDFS
- Job Tracker: manages nodes
- Task Tracking: takes orders from Job Traker
- MapReduce originally developed by Google.
- Apache MapReduce is built on top of Apache YARN which is a framework for job scheduling and cluster resource management.
HDFS (Hadoop Distributed File System) – file store
- It is neither a file system nor a database, it’s neither yet it’s both.
- Within HDFS are two components
  - Data Nodes:
    - data repository
  - Name Nodes:
    - where to find the data; maps blocks of data on slave nodes (where job and task trackers are running)
    - Open, Close, Rename files
- On top of HDFS you can run HBase
  - Super scalable (billions of rows, millions of columns) repository for key-value pairs
  - This is not a database, cannot have multiple indices

Comments Off on Apache Hadoop and its components

Posted in apache, HBase, HDFS, JobTracker, MapReduce, NameNode

Tagged hadoop.apache.org, hbase.apache.org, wikipedia.org

Cloudera Distribution of Hadoop

Posted on October 28, 2013 | Comments Off

Hadoop is an open source Apache project, but a lot of the contributions come from Cloudera.

The Cloudera Distribution of Hadoop (CDH) appears to be the defacto standard, although other vendors such as IBM have their own. Cloudera provides a downloadable VM with a fully configured single node of Hadoop. I was able to get this up an running on my own MacBook Pro running Oracle Virtual Box in about 15 minutes.

Cloudera claims that they have more customers and more experience thatn any other Hadoop vendor.

Sources:

Comments Off on Cloudera Distribution of Hadoop

Posted in apache, cloudera, hadoop, sandbox

Tagged cloudera.com, hadoop.apache.org

Tag Archives: hadoop.apache.org

Apache Hadoop and its components

Cloudera Distribution of Hadoop

Categories

Sources

RSS

Archives