Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-3584

Fileset Datasets need to remove files they added upon transaction rollback.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0, 3.0.0, 2.8.0
    • Fix Version/s: 3.4.0
    • Component/s: Datasets
    • Labels:
      None
    • Release Notes:
      Upon transaction rollback, a PartitionedFileSet will roll back the files for the partitions added and/or removed in that transaction.
    • Rank:
      1|hzyz73:

      Description

      FileSet and PartitionedFileSet datasets need to be able to remove files created for them, upon transaction rollback.

      For instance, a FileSet can be used as the output of a MapReduce job, with a relative path. If this MapReduce job fails and the transaction rollback happens, then the files under that relative path need to be removed, if the the directory at that path is empty.

      Also, a Partition of a PartitionedFileSet dataset removes its file when it is deleted. However, if you are adding a Partition in a transaction and the transaction is rolled back, the partition does not exist, but its files might. This needs to be cleaned up.

      The same applies for TimePartitionedFileSet that is for PartitionedFileSet.

      Also need to consider files written by DynamicPartitioner, in case a job using that fails.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ali.anwar Ali Anwar
                Reporter:
                ali.anwar Ali Anwar
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: