At its core, what the NSA is doing is finding anti-patterns. Crunching through huge sets of non-interesting data is the only way to find the interesting data.
Also, the Department of Defense sees the success that NSA is having with Hadoop technologies, and is considering using it (most likely Accumulo) to store large amounts of unstructured and non-schema data.
Hadoop mindset glorifies having as much raw data as possible. Just build more nodes if necessary. However, there is currently a lack of good meta data tools. Where did the data come from? What’s the retention policy? Who has access to read it, delete it?
CTO of Sqrrl thinks this is a result of the Hadoop environment being designed for developers, not for business users.
Sqrrl is powered by Apache Accumulo, which was originally developed for the NSA in 2008, is a low latency NoSQL database using Hadoop as its file system.
- Support for both role based and attribute based security controls
- Encryption at rest and in motion
- Can use multiple keys
- Trust boundaries limit the admin’s access to data
- Impact of encryption is only about 10% performance degradation