Monthly Archives: December 2013

HaaS Provider Qubole Now Runs on Google Compute Engine (GCE)

I’m starting to see applications ported from AWS to GCE, but not sure about the justifications for running production systems on GCE. Maybe price?


Example use case for Hadoop and Machine to Machine (M2M) data

Telecom OEM WebNMS discusses their use of Hadoop. In one trial, they stored latency data from 7 million cable modems. Using a Hadoop cluster of 20 nodes, they observers a factor of 1o increase in performance compared to a relational database.  In addition, the cost to deploy was a small fraction of the a traditional infrastructure.


Fast Search and Analytics on Hortonworks with Elasticsearch

Elasticworks enables real-time searching and analytics. Yarn is supported. Integration extends into Hive and Pig.


Kiji Project enables development of real-time analytics on Hadoop

Open source framework for for collection and analysis of data for real-time applications such as energy usage and fraud monitoring.


WibiEnterprise bridges between Hadoop and the application layer

The core features of WibiEnterprise 3.0 are frameworks that enable:

  • defining schemas in realtime
  • layer on top of MapReduce
  • model lifecycle (machine learning, batch training, development, scoring)
  • ad hoc queries
  • RESTful interfaces


Interesting use case about migrating away from SQL to Hadoop and NoSQL

Paytronix analyzes data from 8,000 restaurants that adds up to a few tens of terrabytes of data. Not that complex in terms of volume, but there are a lot of data fields and potential reports. They migrated from MS SQL Sever and constantly evolving ETL jobs to Hadoop and MongoDB with a lot of success.


Don’t run Hadoop on a SAN

By definition, a SAN is about consolidating data and Hadoop is about distributing data. Can they co-exist? Not according to this article.

If you take data out of a Hadoop node and put it on a SAN, you’re reducing performance. You want data to transfer to the CPU at bus speed, not network speed. And maybe a heavy Hadoop load could saturate your network.


Big Data as a Service provider has free developer account

Founders of Qubole built some of the big data technology at Facebook (scaled to 25 petabytes). Their new company has a hosted Hadoop infrastructure. Interesting small and free accounts take the IT configuration out of learning Hadoop.


Two part article of Hello World for Hadoop


Summary of Terradata’s big data approach

  • Terradata Aster 6 platform
  • Includes graph analysis engine (visualization), in addition to traditional rows/columns.
  • Enables execution of SQL across multiple NoSQL repositories
  • Integrates with multiple 3rd parties for solutions such as analytical workflow (Alteryx), advanced analytics algorithms (Fuzzy Logix).
  • Cloud services at comparable cost to on-premises