Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-16053

Document Spark parallel sinks option

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Docs, Pipelines
    • Labels:
      None
    • Rank:
      1|i00s73:

      Description

      There is a runtime argument 'pipeline.spark.parallel.sinks.enabled' that can be set to 'true' to tell the pipeline to save sink output in parallel instead of sequentially. This may sometimes lead to better performance in pipelines with a lot of sinks, at the cost of possibly re-processing data.

      It needs to be documented, and potentially exposed in the pipeline config.

        Attachments

          Activity

            People

            • Assignee:
              ashau Albert Shau
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: