Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-13122

Pipeline connectors should use CombineInputFormat

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3.4
    • Component/s: None
    • Labels:
      None
    • Release Notes:
      Minor optimization to reduce the number of mappers used to read intermediate data in mapreduce pipelines
    • Rank:
      1|i00awv:

      Description

      Temporary "connector" filesets are used to store intermediate data in multi-phase mapreduce pipelines. These connectors are currently read using TextInputFormat. This is usually less performant than using the CombineTextInputFormat, as it will generally use more mappers, and thus more containers, memory, and overhead.

        Attachments

          Activity

            People

            • Assignee:
              ashau Albert Shau
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: