Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-12125

Mapreduce pipeline timing metrics are wrong

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3.0
    • Component/s: Pipelines
    • Labels:
      None
    • Release Notes:
      Fixed a bug in mapreduce pipeline timing metrics, where time for a stage could include time spent in other stages.
    • Rank:
      1|i0051z:

      Description

      Timing metrics for mapreduce are not correct. Timing is implemented by wrapping the transform() method and recording the start and end.

      However, this is wrong because Transforms are called recursively to ensure records are not buffered in memory. This means the calling emit() actually calls the next stage's transform() method. So the time for the first stage will include the time for every stage after it. In order to time it properly, we would have to stop the stopwatch every time emit() is called, then restart it when emit() returns.

        Attachments

          Activity

            People

            • Assignee:
              ashau Albert Shau
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: