For example, if a MR job creates multiple partitions, and the last one fails due to a Hive timeout, then the partitions will be rolled back in the PFS meta data (which is transactional) but they will not be cleaned out from Hive.
We need to address this in some way, despite the fact that Hive does not give us any transactional way to perform DDL.
- delay creation of partitions until the commit time of the transaction, and create them all at once. (TBD: Will Hive guarantee that either all or none of them are created?
- attempt to delete the partitions on transaction rollback. (TBD: what happens if Hive is down etc.?)