CDAP
  1. CDAP
  2. CDAP-12646

RunRecordCorrector can take hours to correct run records

    Details

    • Type: Improvement Improvement
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.3.0
    • Fix Version/s: 4.3.1
    • Component/s: None
    • Labels:
    • Release Notes:
      Fixed a performance issue with the run record corrector
    • Rank:
      1|i00867:

      Description

      The AbstractRunRecordCorrectorService can take many hours to correct a run record, depending on how many run records need correcting and depending on the number of total namespaces/applications/programs deployed in the system.

      If there are 5000 programs deployed in the cdap system, then just correcting one run record can involve more than 5000 hbase operations, each in its own transaction.
      For each run record that is running, we execute ProgramLifecycleService#retrieveProgramIdForRunRecord, which lists all namespaces and all apps within each namespace, and performs a GET on each program in each of those namespaces.

      So, if there are 5000 programs deployed, and there are 100 run records with RUNNING status, then this can involve 500,000 hbase Gets, each within its own transaction.

        Issue Links

          Activity

          Hide
          Ali Anwar added a comment -

          This issue did not surface in earlier CDAP versions (such as 4.1), due to the issue (behavior change) described in CDAP-12648.

          Show
          Ali Anwar added a comment - This issue did not surface in earlier CDAP versions (such as 4.1), due to the issue (behavior change) described in CDAP-12648 .
          Hide
          Terence Yim added a comment -
          Show
          Terence Yim added a comment - Fix in https://github.com/caskdata/cdap/pull/9652

            People

            • Assignee:
              Terence Yim
              Reporter:
              Ali Anwar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: