Tag Archives: apache.org

Databricks to commercialize Spark and Shark in-memory processing

Shark utilizes in-memory SQL queries for complex analytics, and is Apache Hive compatible. The name “Shark” is supposed to be short hand for “Hive on Spark”. This seems to be a competitor to Cloudera Impala or the Hortonworks implementation of Hive.

Apache Spark utilizes APIs (Python, Scala, Java) for in-memory processing with very fast reads and writes, claiming to be 100x faster than disk-based MapReduce. Spark is the engine behind Shark. Spark can be considered as an alternative to MapReduce, not an alternative to Hadoop.

Scala is an interesting language being used by companies such as Twitter as both higher performance and easier to write than Java. Some companies that had originally developed using Rails or C++ are migrating to Scala rather than to Java.