Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-2983

Spark program runner should call onFailure() of DatasetOutputCommitter

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.1.0
    • Component/s: Datasets, Spark
    • Labels:
      None
    • Rank:
      1|hzyvwn:

      Description

      The only thing that is not implemented for Spark is to call the onFailure() method of the DatasetOutputCommitter if the Spark job fails. For MapReduce it is obvious that this happens exactly for one dataset, at the end of the job. Spark can write multiple times, and it is not clear what the semantics are. If a write to one dataset succeeds, we call its onSuccess(), but if subsequently, a write to another dataset fails (or the processing in between), do we also have to call onFailure()? Or only for the one that failed?

      Since none of our datasets implement onFailure(), it is ok for now to leave this out. But we should complete this soon.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                andreas Andreas Neumann
                Reporter:
                andreas Andreas Neumann
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: