Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-13129

Sink should not be included in multiple mapreduce phases

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3.4
    • Component/s: None
    • Labels:
      None
    • Release Notes:
      Fixed a planner bug to ensure that sinks are never placed in two different mapreduce phases in the same pipeline.
    • Rank:
      1|i00ayf:

      Description

      The mapreduce pipeline planner can place the same sink in multiple mapreduce phases. For some sinks, this is ok but for others it is not. For example, I believe the partitioned file set sinks will fail because whatever job happens to finish first will successfully add a partition, but the second job will try to add that same partition and fail.

      The planner should instead ensure that connectors are used to ensure that sinks are only written to once in a single mapreduce job, similar to how we ensure that a source is only read from once in a single mapreduce job.

      An example pipeline that causes this issue looks like:

               |-- agg1 --|
      source --|          |-- sink
               |-- agg2 --|
      

        Attachments

          Activity

            People

            • Assignee:
              ashau Albert Shau
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: