Using the R programming language with Hadoop to create graphical views of statistical models

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques. One of R’s strengths is the ease with which well-designed publication-quality plots can be produced.

The update to R, from Revolution Analytics, is significant because previously data had to be moved into a R environment in order to process and plot the data. This update enables R to run within the Cloudera Hadoop enviornment so that data does not need to be moved out of HDFS, across the network, and onto another machine for processing.

I think that this is significant because R enables a single page graphic to represent the analysis on data (that is potentially petabytes in size). Seems to me that R takes as input the data that is generated by the Reduce portion of MapReduce.


Comments are closed.