Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-8565

Improve cdap-master stop and kill behavior

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1.0, 3.5.4
    • Component/s: None
    • Labels:
    • Release Notes:
      Improved the master process stop procedure to support fast failover when running with HA. Added a new kill command to force-kill CDAP processes.
    • Rank:
      1|hzzvf3:

      Description

      Currently on cdap-master stop, the leader will always stop the yarn-application. This is undesirable in HA setup as it takes the follower longer time to restore CDAP functionalities after the follower becomes leader.

      In order to improve the experience, here is the proposed improvement:

      New behavior on cdap master stop
      • If the master process is the leader
        1. Withdraws from leader-election
          • If there is no other participants it knows of, stop the yarn-app and exit the process
            • This is the same as current behavior in non-HA setup
          • If there other participants, just exit the process
            • In HA setup, the yarn-app will be kept running. Only the master process will exit
      • If the master process is the follower
        1. Just withdraw from leader-election and exit the process
          • This is the same as current behavior in both HA and non-HA setup

      With this new stop behavior, performing a controlled failover would have much faster failover time. Also, this makes rolling restart possible with minimal service interruption.

      Introduce a new cdap master kill command
      • As the name suggested, the script will issue SIGKILL to the master process directly.
      • This is good for simulating an uncontrolled shutdown case
      Update the /system/services/appfabric/live-info endpoint
      • Include hostname for all the participants in the HA leader-election, in the order of leader to followers, based on information in ZK.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                terence Terence Yim
                Reporter:
                terence Terence Yim
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: