Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-1393

Hive DDL statements from datasets should be done transactionally

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 2.7.1
    • Fix Version/s: None
    • Component/s: Datasets, Explore
    • Labels:
    • Rank:
      1|hzy5wn:

      Description

      For example, when a partitioned dataset adds a partition, it

      1. writes the file
      2. adds the partition to its meta table
      3. makes a call to Hive to register the partition there

      If the transaction fails, then second step is rolled back, but the files remain in the file system and the partition remains in Hive.

      This needs to be fixed. It is not clear how to do this correctly: Hive does not support a 2-phase commit or similar mechanism.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ali.anwar Ali Anwar
                Reporter:
                andreas Andreas Neumann
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: