Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-5844

Failure to update HDFS delegation token for long running application in HA mode

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.3.3, 3.3.2, 3.3.1, 3.3.0
    • Fix Version/s: 3.3.4, 3.4.0
    • Component/s: Security
    • Labels:
      None
    • Release Notes:
      Update HDFS delegation token properly for HA mode.
    • Rank:
      1|hzzc8n:

      Description

      Setup:
      Kerberos-enabled, fully HA, 6-node CM cluster.
      Set following two parameters to 600000 (10 minutes):
      dfs.namenode.delegation.token.renew-interval
      dfs.namenode.delegation.token.max-lifetime

      Transaction service (and other containers) will fail (note that no failover is required). Error log pasted at the bottom.

      Relevant Hadoop JIRA:
      https://issues.apache.org/jira/browse/HDFS-9276

      Workaround in Spark (which didn't work for us):
      https://github.com/apache/spark/pull/7069

      2016-04-27 19:02:20,746 - ERROR [message-callback:o.a.t.i.y.AbstractYarnTwillService@96] - Failed to update secure store.
      org.apache.hadoop.ipc.RemoteException: token (HDFS_DELEGATION_TOKEN token 2710 for cdap) is expired
              at org.apache.hadoop.ipc.Client.call(Client.java:1466) ~[hadoop-common-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.ipc.Client.call(Client.java:1403) ~[hadoop-common-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) ~[hadoop-common-2.6.0-cdh5.5.2.jar:na]
              at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source) ~[na:na]
              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:254) ~[hadoop-hdfs-2.6.0-cdh5.5.2.jar:na]
              at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) ~[na:na]
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_67]
              at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_67]
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) ~[hadoop-common-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) ~[hadoop-common-2.6.0-cdh5.5.2.jar:na]
              at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source) ~[na:na]
              at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1258) ~[hadoop-hdfs-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1245) ~[hadoop-hdfs-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1233) ~[hadoop-hdfs-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:302) ~[hadoop-hdfs-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:268) ~[hadoop-hdfs-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:260) ~[hadoop-hdfs-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1564) ~[hadoop-hdfs-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:308) ~[hadoop-hdfs-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304) ~[hadoop-hdfs-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:304) ~[hadoop-hdfs-2.6.0-cdh5.5.2.jar:na]
              at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:775) ~[hadoop-common-2.6.0-cdh5.5.2.jar:na]
              at org.apache.twill.filesystem.HDFSLocation.getInputStream(HDFSLocation.java:74) ~[org.apache.twill.twill-yarn-0.7.0-incubating.jar:0.7.0-incubating]
              at org.apache.twill.internal.yarn.AbstractYarnTwillService.handleSecureStoreUpdate(AbstractYarnTwillService.java:86) ~[org.apache.twill.twill-yarn-0.7.0-incubating.jar:0.7.0-incubating]
              at org.apache.twill.internal.container.TwillContainerService.onReceived(TwillContainerService.java:88) [org.apache.twill.twill-yarn-0.7.0-incubating.jar:0.7.0-incubating]
              at org.apache.twill.internal.AbstractTwillService.handleMessage(AbstractTwillService.java:314) [org.apache.twill.twill-core-0.7.0-incubating.jar:na]
              at org.apache.twill.internal.AbstractTwillService.access$900(AbstractTwillService.java:83) [org.apache.twill.twill-core-0.7.0-incubating.jar:na]
              at org.apache.twill.internal.AbstractTwillService$4.onSuccess(AbstractTwillService.java:265) [org.apache.twill.twill-core-0.7.0-incubating.jar:na]
              at org.apache.twill.internal.AbstractTwillService$4.onSuccess(AbstractTwillService.java:245) [org.apache.twill.twill-core-0.7.0-incubating.jar:na]
              at com.google.common.util.concurrent.Futures$6.run(Futures.java:799) [com.google.guava.guava-13.0.1.jar:na]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_67]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_67]
              at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ali.anwar Ali Anwar
                Reporter:
                ali.anwar Ali Anwar
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: