Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-8448

Most RESTful endpoints return 500 when the transaction service is unavailable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1.0
    • Component/s: None
    • Labels:
      None
    • Release Notes:
      In general, fixed the HTTP RESTful endpoints to return a 503 instead of 500 when the transaction service was unavailable.
    • Rank:
      1|hzzuov:

      Description

      Methods in the Admin class are supposed to be retried if dependent services are down. They are retried if dataset service is unavailable, but are not retried if the call to dataset service fails because tx service is unavailable.

      The program ends up seeing an error like:

      co.cask.cdap.api.dataset.DatasetManagementException: Failed to truncate instance trunkate, details: Response code: 500, message: 'Internal Server Error', body: 'Unable to discover tx service.'
              at co.cask.cdap.data2.datafabric.dataset.DatasetServiceClient.truncateInstance(DatasetServiceClient.java:204) ~[na:na]
              at co.cask.cdap.data2.datafabric.dataset.RemoteDatasetFramework.truncateInstance(RemoteDatasetFramework.java:200) ~[na:na]
              at co.cask.cdap.data2.dataset2.ForwardingDatasetFramework.truncateInstance(ForwardingDatasetFramework.java:129) ~[na:na]
              at co.cask.cdap.data2.datafabric.dataset.DefaultDatasetManager$7.call(DefaultDatasetManager.java:153) ~[na:na]
              at co.cask.cdap.data2.datafabric.dataset.DefaultDatasetManager$7.call(DefaultDatasetManager.java:149) ~[na:na]
              at co.cask.cdap.common.service.Retries.callWithRetries(Retries.java:139) ~[na:na]
              at co.cask.cdap.common.service.Retries.callWithRetries(Retries.java:114) ~[na:na]
              at co.cask.cdap.data2.datafabric.dataset.DefaultDatasetManager.truncateDataset(DefaultDatasetManager.java:149) ~[na:na]
              at co.cask.resiliency.AdminRunner$TruncateRunnable.runCmd(AdminRunner.java:134) ~[1486765981226-0/:na]
              at co.cask.resiliency.AdminRunner$AdminRunnable.run(AdminRunner.java:61) ~[1486765981226-0/:na]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75]
              at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
      

      and in the master log we see something like:

      2017-02-10 22:52:57,460 - ERROR [dataset.service-executor-9:c.c.c.c.HttpExceptionHandler@68] - Unexpected error: request=POST /v3/namespaces/default/data/datasets/trunkate/admin/truncate user=<null>:
      java.lang.RuntimeException: org.apache.thrift.TException: Unable to discover tx service.
              at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[com.google.guava.guava-13.0.1.jar:na]
              at org.apache.tephra.distributed.TransactionServiceClient.startShort(TransactionServiceClient.java:270) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at co.cask.cdap.data2.transaction.DistributedTransactionSystemClientService.startShort(DistributedTransactionSystemClientService.java:99) ~[na:na]
              at co.cask.cdap.data2.transaction.AbstractTransactionContext.start(AbstractTransactionContext.java:100) ~[na:na]
              at co.cask.cdap.data2.transaction.DynamicTransactionExecutor.executeOnce(DynamicTransactionExecutor.java:135) ~[na:na]
              at co.cask.cdap.data2.transaction.DynamicTransactionExecutor.executeWithRetry(DynamicTransactionExecutor.java:104) ~[na:na]
              at co.cask.cdap.data2.transaction.DynamicTransactionExecutor.execute(DynamicTransactionExecutor.java:61) ~[na:na]
              at co.cask.cdap.data2.transaction.DynamicTransactionExecutor.execute(DynamicTransactionExecutor.java:79) ~[na:na]
              at org.apache.tephra.AbstractTransactionExecutor.executeUnchecked(AbstractTransactionExecutor.java:67) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at co.cask.cdap.data2.datafabric.dataset.instance.DatasetInstanceManager.get(DatasetInstanceManager.java:85) ~[na:na]
              at co.cask.cdap.data2.datafabric.dataset.service.DatasetInstanceService.executeAdmin(DatasetInstanceService.java:416) ~[na:na]
              at co.cask.cdap.data2.datafabric.dataset.service.DatasetInstanceHandler.executeAdmin(DatasetInstanceHandler.java:191) ~[na:na]
              at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) ~[na:na]
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_75]
              at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_75]
              at co.cask.http.HttpMethodInfo.invoke(HttpMethodInfo.java:80) ~[co.cask.http.netty-http-0.16.0.jar:na]
              at co.cask.http.HttpDispatcher.messageReceived(HttpDispatcher.java:38) [co.cask.http.netty-http-0.16.0.jar:na]
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.messageReceived(SimpleChannelUpstreamHandler.java:124) [io.netty.netty-3.6.6.Final.jar:na]
              at co.cask.cdap.common.http.AuthenticationChannelHandler.messageReceived(AuthenticationChannelHandler.java:64) [na:na]
              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67) [io.netty.netty-3.6.6.Final.jar:na]
              at org.jboss.netty.handler.execution.OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run(OrderedMemoryAwareThreadPoolExecutor.java:314) [io.netty.netty-3.6.6.Final.jar:na]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75]
              at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
      Caused by: org.apache.thrift.TException: Unable to discover tx service.
              at org.apache.tephra.distributed.AbstractClientProvider.newClient(AbstractClientProvider.java:106) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.AbstractClientProvider.newClient(AbstractClientProvider.java:85) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.PooledClientProvider$TxClientPool.create(PooledClientProvider.java:48) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.PooledClientProvider$TxClientPool.create(PooledClientProvider.java:41) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.ElasticPool.getOrCreate(ElasticPool.java:138) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.ElasticPool.obtain(ElasticPool.java:125) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.PooledClientProvider.getCloseableClient(PooledClientProvider.java:101) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.TransactionServiceClient.execute(TransactionServiceClient.java:217) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.TransactionServiceClient.execute(TransactionServiceClient.java:188) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.TransactionServiceClient.startShort(TransactionServiceClient.java:261) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              ... 29 common frames omitted
      

        Attachments

          Activity

            People

            • Assignee:
              ashau Albert Shau
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: