Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-7417

file set partitions are not cleaned up in Hive if the transaction fails

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.5.1
    • Fix Version/s: 4.0.0
    • Component/s: Datasets
    • Labels:
    • Release Notes:
      Fixes an issue where partitions of PartitionedFileSet were not cleaned up properly after transaction failure.
    • Rank:
      1|hzy5t3:

      Description

      For example, if a MR job creates multiple partitions, and the last one fails due to a Hive timeout, then the partitions will be rolled back in the PFS meta data (which is transactional) but they will not be cleaned out from Hive.

      We need to address this in some way, despite the fact that Hive does not give us any transactional way to perform DDL.

      Ideas:

      • delay creation of partitions until the commit time of the transaction, and create them all at once. (TBD: Will Hive guarantee that either all or none of them are created?
      • attempt to delete the partitions on transaction rollback. (TBD: what happens if Hive is down etc.?)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                andreas Andreas Neumann
                Reporter:
                andreas Andreas Neumann
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: