HortonWorks developed solutions to add into Hive the ability to update multiple records as a single transaction following the ACID model. Part of the complexity of transactional updates is that the data must be written to all applicable nodes before the transaction can be considered complete. The naming convention within HDFS folders includes a transaction ID so that both committed and uncommitted files persist until all portions of the transaction have been completed. Because the transaction ID is included, any read operations that occur before the transaction has completed will access the old data.
Why go through all of this work to add an ACID model to Hive rather than just use HBase, which already supports transactions. The primary reason is that HBase only supports Consistency at the level of a single row update, rather than with a larger set of operations. Without Consistency, there is no ACID. HortonWorks lists a few other reasons, but I’m discounting them because they are general reasons why they prefer Hive over HBase.