Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.5.0
    • Component/s: Pipelines
    • Labels:
      None
    • Release Notes:
      Refactored the Spark engine in data pipelines to run all non-action pipeline stages in a single Spark program.
    • Rank:
      1|hzzffz:

      Description

      The current spark program works on a single phase of a data pipeline, and is directly analogous to a mapreduce phase.

      In reality, an entire pipeline can be executed in a single spark program. This will make things more efficient by removing artificial storage layers in between phases, and allow a lot of code re-use between data-pipeline and data-streams artifacts.

        Attachments

          Activity

            People

            • Assignee:
              ashau Albert Shau
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: