Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-4767

Transaction service cannot be discovered on a secure cluster

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.3.0
    • Fix Version/s: 3.3.1
    • Component/s: Master
    • Labels:
      None
    • Release Notes:
      Fixed an issue where delegation token cancellation of CDAP program was affecting CDAP master services.
    • Rank:
      1|hzy4vr:

      Description

      Master seems to be failing to discover transaction service on secure clusters because the HDFS_DELEGATION_TOKEN for the cdap user cannot be found in the cache. Exception:

      2016-01-26T19:39:30,366Z INFO  o.a.t.i.a.ApplicationMasterService [slave4.sjc1.continuuity.net] [ApplicationMasterService] ApplicationMaste
      rService:handleCompleted(ApplicationMasterService.java:440) - Container container_1446847897094_3208_01_001504 completed with COMPLETE:toke
      n (HDFS_DELEGATION_TOKEN token 6728 for cdap) can't be found in cache
      org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 6728
       for cdap) can't be found in cache
              at org.apache.hadoop.ipc.Client.call(Client.java:1468)
              at org.apache.hadoop.ipc.Client.call(Client.java:1399)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
              at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:606)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
              at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
              at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
              at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
              at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
              at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
              at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
              at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
              at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
              at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
              at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
              at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
              at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      
      ata/datasets/app.meta user=<null>:
      java.lang.RuntimeException: java.lang.Exception: Thrift error for co.cask.tephra.distributed.TransactionServiceClient$2@1f176d5: Unable to
      discover tx service.
              at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[com.google.guava.guava-13.0.1.jar:na]
              at co.cask.tephra.distributed.TransactionServiceClient.startShort(TransactionServiceClient.java:269) ~[co.cask.tephra.tephra-core-0
      .6.3.jar:na]
              at co.cask.cdap.data2.transaction.DistributedTransactionSystemClientService.startShort(DistributedTransactionSystemClientService.ja
      va:99) ~[co.cask.cdap.cdap-data-fabric-3.3.0.jar:na]
              at co.cask.tephra.TransactionContext.start(TransactionContext.java:89) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              at co.cask.tephra.DefaultTransactionExecutor.executeOnce(DefaultTransactionExecutor.java:133) ~[co.cask.tephra.tephra-core-0.6.3.ja
      r:na]
              at co.cask.tephra.DefaultTransactionExecutor.executeWithRetry(DefaultTransactionExecutor.java:115) ~[co.cask.tephra.tephra-core-0.6
      .3.jar:na]
              at co.cask.tephra.DefaultTransactionExecutor.execute(DefaultTransactionExecutor.java:72) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              at co.cask.tephra.DefaultTransactionExecutor.execute(DefaultTransactionExecutor.java:90) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              at co.cask.cdap.data2.dataset2.tx.TransactionalDatasetRegistry.execute(TransactionalDatasetRegistry.java:71) ~[co.cask.cdap.cdap-da
      ta-fabric-3.3.0.jar:na]
              at co.cask.cdap.data2.dataset2.tx.TransactionalDatasetRegistry.executeUnchecked(TransactionalDatasetRegistry.java:87) ~[co.cask.cda
      p.cdap-data-fabric-3.3.0.jar:na]
              at co.cask.cdap.data2.datafabric.dataset.instance.DatasetInstanceManager.get(DatasetInstanceManager.java:62) ~[co.cask.cdap.cdap-da
      ta-fabric-3.3.0.jar:na]
              at co.cask.cdap.data2.datafabric.dataset.service.DatasetInstanceService.get(DatasetInstanceService.java:123) ~[co.cask.cdap.cdap-data-fabric-3.3.0.jar:na]
              at co.cask.cdap.data2.datafabric.dataset.service.DatasetInstanceHandler.get(DatasetInstanceHandler.java:83) ~[co.cask.cdap.cdap-data-fabric-3.3.0.jar:na]
              at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) ~[na:na]
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_75]
              at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_75]
              at co.cask.http.HttpMethodInfo.invoke(HttpMethodInfo.java:80) ~[co.cask.http.netty-http-0.14.0.jar:na]
              at co.cask.http.HttpDispatcher.messageReceived(HttpDispatcher.java:38) [co.cask.http.netty-http-0.14.0.jar:na]
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.messageReceived(SimpleChannelUpstreamHandler.java:124) [io.netty.netty-3.6.6.Final.jar:na]
              at co.cask.cdap.common.http.AuthenticationChannelHandler.messageReceived(AuthenticationChannelHandler.java:63) [co.cask.cdap.cdap-common-3.3.0.jar:na]
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.handler.execution.OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run(OrderedMemoryAwareThreadPoolExecutor.java:314) [io.netty.netty-3.6.6.Final.jar:na]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75]
              at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
      Caused by: java.lang.Exception: Thrift error for co.cask.tephra.distributed.TransactionServiceClient$2@1f176d5: Unable to discover tx service.
              at co.cask.tephra.distributed.TransactionServiceClient.execute(TransactionServiceClient.java:228) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              at co.cask.tephra.distributed.TransactionServiceClient.execute(TransactionServiceClient.java:186) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              at co.cask.tephra.distributed.TransactionServiceClient.startShort(TransactionServiceClient.java:260) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              ... 30 common frames omitted
      Caused by: org.apache.thrift.TException: Unable to discover tx service.
              at co.cask.tephra.distributed.AbstractClientProvider.newClient(AbstractClientProvider.java:104) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              at co.cask.tephra.distributed.AbstractClientProvider.newClient(AbstractClientProvider.java:83) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              at co.cask.tephra.distributed.PooledClientProvider$TxClientPool.create(PooledClientProvider.java:46) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              at co.cask.tephra.distributed.PooledClientProvider$TxClientPool.create(PooledClientProvider.java:39) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              at co.cask.tephra.distributed.ElasticPool.getOrCreate(ElasticPool.java:136) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              at co.cask.tephra.distributed.ElasticPool.obtain(ElasticPool.java:123) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              at co.cask.tephra.distributed.PooledClientProvider.getCloseableClient(PooledClientProvider.java:99) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              at co.cask.tephra.distributed.TransactionServiceClient.execute(TransactionServiceClient.java:215) ~[co.cask.tephra.tephra-core-0.6.3.jar:na]
              ... 32 common frames omitted
      

      This seems to be a recurrence of CDAP-562 and related to TWILL-106

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ali.anwar Ali Anwar
                Reporter:
                bhooshan Bhooshan Mogal
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: