Category Archives: Analytics

Hydra is a non-Hadoop database for realtime analysis of dynamic data

Hydra is not built on top of Hadoop, but functions similar to Summingbird, Storm, and Spark.

Data can stream into it, and analytics can be run in real time, rather than only in batch.

AddThis is the company that originally developed Hydra, which is now in open sourced through Apache. AddThis runs six Hydra clusters, one of which is comprised of 156 servers and processes 3.5 billion transactions per day.



Splunk Analytics for Hadoop enables detection of patterns across petabytes of data

Advantage is that schemas don’t need to be created in order to search for patterns, since Hadoop is leveraged. Makes sense, since by creating a schema the user is already making assumptions about where the patterns exist. By doing a schema-less analysis, it’s possible to find unexpected anomalies within patterns and to find entirely new patterns.

Splunk includes visualization components.

Splunk’s Director of Big Data Marketing, Brett Sheppard, says that this is well suited for the Internet of Things (IoT), which can leverage visualization tools that report on the results of searching for anomalies in large amounts of machine generated data.



Fast Search and Analytics on Hortonworks with Elasticsearch

Elasticworks enables real-time searching and analytics. Yarn is supported. Integration extends into Hive and Pig.


Kiji Project enables development of real-time analytics on Hadoop

Open source framework for for collection and analysis of data for real-time applications such as energy usage and fraud monitoring.