Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-12466

Have better fencing mechanism for HA services to avoid split brain


    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Rank:


      Need a unified way for fencing to avoid split brain in TMS and transaction service.

      Split brain can happen when failing over the leader from one process to another. Consider the following scenario (without fencing), which involves two processes, P1 and P2.

      1. P1 and P2 starts
      2. Via leader election P1 is the leader and P2 is the follower
      3. Since P1 is the leader, it starts services and register itself to discovery service.
      4. A write request comes into P1, while P1 is handling that request, it gets disconnected from ZK.

      When that happen, the following sequence of events may happen.

      a. In P2, the leader() method is called from the ZK event thread. It starts the server and register itself to ZK
      b. In P1, the follower() method is called from the ZK event thread.
      c. A client, which discovers P2, send a write request to P2. P2 handles it and perform the write.
      d. In P1, it performs the write that it received in step 4 above in the writer thread <---- Split Brain!!

      • Check if itself is the leader before issuing the write doesn't help, as P1 can become follower between the check and the actual write.

      The write action above could be starting a new TX or publishing a new message, which both requires a single brain to decide the unique write pointer / message id.

      Basically fencing is to avoid the above scenario.




            • Assignee:
              terence Terence Yim
              terence Terence Yim
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: