- Batch aggregation of data processed in Hadoop, and then stored in MongoDB for later ad-hoc analysis
- Staging area for batch loads into Hadoop
- Using MapReduce for complex ETL migrations
Retrieval from Hadoop is not a real-time process. Depending on the dataset and the query, retrieval may be quick or it may take many days to execute. Therefore it’s important to extract from Hadoop and store results into a transactional database. Traditional SQL databases can be used, such as MS SQL, Oracle, DB2, etc, open source databases such as MySQL (or MySQL Drizzle), or NoSQL databases such as MongoDB or CouchDB.
Hadoop is used for exhaustive data analysis, whereas the SQL or NoSQL database is used for retrieval for application use. Hadoop can also feed into a data warehouse (but probably not extract from?). A data warehouse has data that is structured for very fast retrieval based on analysis that has already been performed. Hadoop’s data almost seems to be structured in the opposite manner.