Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-15778

Runtime Monitor fail with SSLHandshakeException

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1.0, 6.0.1
    • Component/s: None
    • Labels:
      None
    • Rank:
      1|i00qh3:

      Description

      In remote execution mode (when non-native compute profile is used), program run occasionally stuck in "Starting" phase, even the actual program execution already completed in the target cluster.

      It is caused by the RuntimeMonitor not able to fetch metadata from the RemoteRuntimeServer. The underlying failure is caused by SSLHandshakeException, which unfortunately didn't get log due to CDAP-15505.

      The reason of the SSL exception is because occasionally the remote runtime server binds to a port that was already used by the NFS kernel server. Running the netstat command confirming it:

      $ sudo netstat -npa | grep 34797
      tcp6       0      0 127.0.0.1:34797         :::*                    LISTEN      4210/java           
      tcp6       0      0 :::34797                :::*                    LISTEN      -         
      

      This can happen because NFS binds with the IPV6_V6ONLY socket option. When the remote runtime monitor server starts up, the OS is allowed to give it the same port for binding to the IPv4 127.0.0.1 address. When this happen, HTTPS calls to "localhost:port" will be handled by the NFS server, hence resulting in connection close by the server, hence the SSL handshake error.

      I first search for why there is no process ID from the netstat output and saw a lot of references referring to NFS kernel server. After I shutdown the NFS kernel server, the RuntimeMonitor was able to fetch metadata and proceed.

      Ref: https://unix.stackexchange.com/questions/97752/how-to-identify-a-process-which-has-no-pid
      Ref2: https://unix.stackexchange.com/questions/536634/linux-gives-an-ephemeral-port-that-is-already-used-and-bind-on-any-interface

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                terence Terence Yim
                Reporter:
                terence Terence Yim
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: