Suppose Zookeeper goes down for a short time. That will have (at least) two effects:
- HBase may go down as it relies on ZK
- CDAP Master will lose its ZK connection and shutdown (becomes follower)
After Zookeeper comes back, the CDAP Master will become leader again and attempts to start up its services. However, if at that time HBase is still down, DatasetService will fail to start, which will terminate the start up sequence in the master and master will give up and exit.
This means CDAP can in some cases not recover from a ZK failure and will have to be restarted manually.