Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-12646

RunRecordCorrector can take hours to correct run records

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.3.0
    • Fix Version/s: 4.3.1
    • Component/s: None
    • Labels:
    • Release Notes:
      Fixed a performance issue with the run record corrector
    • Rank:
      1|i00867:

      Description

      The AbstractRunRecordCorrectorService can take many hours to correct a run record, depending on how many run records need correcting and depending on the number of total namespaces/applications/programs deployed in the system.

      If there are 5000 programs deployed in the cdap system, then just correcting one run record can involve more than 5000 hbase operations, each in its own transaction.
      For each run record that is running, we execute ProgramLifecycleService#retrieveProgramIdForRunRecord, which lists all namespaces and all apps within each namespace, and performs a GET on each program in each of those namespaces.

      So, if there are 5000 programs deployed, and there are 100 run records with RUNNING status, then this can involve 500,000 hbase Gets, each within its own transaction.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                terence Terence Yim
                Reporter:
                ali.anwar Ali Anwar
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: