Our current Coopr autobuild templates building from develop branch are spinning up CDAP clusters where everything is running but CDAP master will not come up until it is manually restarted.
The cdap master throws a zk exception and proceeds no further, and apparently never retries: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /
Full cdap-master log is attached.
Zookeeper is running:
- zookeeper-client ls /
[cdap, hbase, zookeeper]
No apps are shown running in yarn.
CDAP UI bounces between "cannot access namespace" and "services are OK" (or similar).
After a single restart of cdap-master, CDAP comes up just fine.
It may be that the Coopr template has a bug to get it into this state, but it should still be able to recover from it. It appears it may not be retrying it's zookeeper connection or similar.