Twitter creates Hadoop hybrid system to mitigate tradeoffs between batch and stream processing

Storm is an open sourced system (from Twitter) that processes streams of big data in realtime (but without 100% guaranteed accuracy), making it the opposite of Hadoop which processes a repository of big data in batch.

Twitter has needs for both streaming and batch, so created an open sourced hybrid system called Summingbird. It does what Storm does, then uses Hadoop for error correction.

Twitter’s use cases include updating timelines and trending topics in real time, but then making sure that the analytics are accurate.

Yahoo’s contribution to this effort was to enable Storm to be configured using Yarn.


Comments are closed.