Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-14208

UI should not set spark.master for streaming pipelines

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1.0
    • Component/s: UI
    • Labels:
      None
    • Sprint:
      5.1 09/09
    • Rank:
      1|i00fd2:

      Description

      The UI has the ability to set the number of executors for a pipeline. In streaming pipelines, it does this by setting engine config:

      system.spark.spark.master = local[

      {num-executors}]

      In distributed, it sets:

      system.spark.spark.executor.instances = {num-executors}

      We don't need different behavior in different environments. This adds code complexity, can result in pipelines that are blocked, and makes it so that the pipeline can't be exported from sandbox and imported to distributed. It's enough to always set spark.executor.instances.

      The spark.master should not be set, as it's easy for people to set it to a number that blocks all progress. Internally, Spark uses a thread for each DStream, so if executors is set to 1, the pipeline will block. If there are 2 sources and executors is set to 2, the pipeline will block. The backend will set the number of threads based on the number of executors each source says it needs, so the UI should not override it.

        Attachments

          Activity

            People

            • Assignee:
              ajai Ajai Narayan
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: