Monthly Archives: February 2014

Marilyn Matz, CEO of Paradigm4 explains why some use cases are NOT a good fit for Hadoop

Hadoop works well when a problem can be broken down into discrete and parallel sub-tasks. Some problems must be applied to an entire dataset. She lists some of these: correlation, covariance, principal component analysis, multivariate statistics, generalized linear models.


Hard to believe, but here’s how to install Hadoop on RaspberryPi

I haven’t tried this myself (don’t have a RaspberryPi, but only have an Arduino), and even if it’s possible to get it to install I’m not sure what the runtime could accomplish, but this guy has published a short list of instructions on¬†how to install Hadoop on RaspberryPi.


Western Union using Hadoop for real-time analytics

Western Union has 70 million customers in 200 countries, and processes 29 payment service transactions per second. They are now using Hadoop for real time analytics, which seems surprising as I’d expect a more likely use case to be batch analytics.


Intel distribution of Hadoop optimized for it’s own hardware

Hadoop is generally assumed to run on clusters of generic commodity hardware. Intel has just released a customized/optimized distribution that it claims is up to 30x faster if run on the Xenon E7 v2 family of processors, which is hardly generic or commodity.


Teradata to integrate Hadoop into its legacy platform

Not sure how well this will work, or if the use cases support it. Rather than optimize Hadoop for use cases that it was designed for, Teradata is merging Hadoop into its legacy core data warehouse. Will Hadoop add value or make it overly complex?


Hydra is a non-Hadoop database for realtime analysis of dynamic data

Hydra is not built on top of Hadoop, but functions similar to Summingbird, Storm, and Spark.

Data can stream into it, and analytics can be run in real time, rather than only in batch.

AddThis is the company that originally developed Hydra, which is now in open sourced through Apache. AddThis runs six Hydra clusters, one of which is comprised of 156 servers and processes 3.5 billion transactions per day.