Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-2569

master process not resilient to zookeeper exception

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1.0
    • Component/s: CDAP
    • Labels:
      None
    • Rank:
      1|hzyttr:

      Description

      Our current Coopr autobuild templates building from develop branch are spinning up CDAP clusters where everything is running but CDAP master will not come up until it is manually restarted.

      The cdap master throws a zk exception and proceeds no further, and apparently never retries: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /

      Full cdap-master log is attached.

      Zookeeper is running:

      1. zookeeper-client ls /
        <snip>
        [cdap, hbase, zookeeper]

      No apps are shown running in yarn.

      CDAP UI bounces between "cannot access namespace" and "services are OK" (or similar).

      After a single restart of cdap-master, CDAP comes up just fine.

      It may be that the Coopr template has a bug to get it into this state, but it should still be able to recover from it. It appears it may not be retrying it's zookeeper connection or similar.

        Attachments

        1. master.log
          8 kB
        2. zk-issue-3.0.1.txt
          7 kB
        3. zk-no-issue-3.0.0.txt
          25 kB

          Activity

            People

            • Assignee:
              terence Terence Yim
              Reporter:
              derek Derek Wood
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: