Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-8446

Transactional does not throw the right exception if tx service is down during tx finish

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1.0
    • Component/s: None
    • Labels:
      None
    • Release Notes:
      Fixed an issue where the Transactional.run method could throw the wrong exception if the transaction service was unavailable when it was finishing a transaction.
    • Rank:
      1|hzzuof:

      Description

      I have a program that is using Transactional.run(TxRunnable). If the transaction service is down when CDAP is trying to finish the transaction, the method ends up throwing a RuntimeException instead of a TransactionFailureException.

      java.lang.RuntimeException: org.apache.thrift.TException: Unable to discover tx service.
              at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[com.google.guava.guava-13.0.1.jar:na]
              at org.apache.tephra.distributed.TransactionServiceClient.abort(TransactionServiceClient.java:341) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at co.cask.cdap.data2.transaction.RetryingTransactionSystemClient.abort(RetryingTransactionSystemClient.java:85) ~[na:na]
              at co.cask.cdap.data2.transaction.AbstractTransactionContext.abort(AbstractTransactionContext.java:155) ~[na:na]
              at co.cask.cdap.data2.transaction.AbstractTransactionContext.checkForConflicts(AbstractTransactionContext.java:242) ~[na:na]
              at co.cask.cdap.data2.transaction.AbstractTransactionContext.finish(AbstractTransactionContext.java:115) ~[na:na]
              at co.cask.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:241) ~[na:na]
              at co.cask.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:223) ~[na:na]
              at co.cask.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:445) [na:na]
              at co.cask.cdap.internal.app.runtime.AbstractContext.execute(AbstractContext.java:433) [na:na]
              at co.cask.resiliency.TxWorker.run(TxWorker.java:31) ~[1486765966998-0/:na]
              at co.cask.cdap.internal.app.runtime.worker.WorkerDriver$1.run(WorkerDriver.java:85) [na:na]
              at co.cask.cdap.internal.app.runtime.AbstractContext.executeChecked(AbstractContext.java:492) [na:na]
              at co.cask.cdap.internal.app.runtime.worker.WorkerDriver.run(WorkerDriver.java:82) [na:na]
              at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) [com.google.guava.guava-13.0.1.jar:na]
              at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
      Caused by: org.apache.thrift.TException: Unable to discover tx service.
              at org.apache.tephra.distributed.AbstractClientProvider.newClient(AbstractClientProvider.java:106) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.AbstractClientProvider.newClient(AbstractClientProvider.java:85) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.PooledClientProvider$TxClientPool.create(PooledClientProvider.java:48) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.PooledClientProvider$TxClientPool.create(PooledClientProvider.java:41) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.ElasticPool.getOrCreate(ElasticPool.java:138) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.ElasticPool.obtain(ElasticPool.java:125) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.PooledClientProvider.getCloseableClient(PooledClientProvider.java:101) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.TransactionServiceClient.execute(TransactionServiceClient.java:217) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.TransactionServiceClient.execute(TransactionServiceClient.java:188) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              at org.apache.tephra.distributed.TransactionServiceClient.abort(TransactionServiceClient.java:331) ~[org.apache.tephra.tephra-core-0.11.0-incubating-SNAPSHOT.jar:0.11.0-incubating-SNAPSHOT]
              ... 14 common frames omitted
      

      This is happening when AbstractTransactionContext checks if it can commit, but gets an exception from the txclient. It then tries to abort the current transaction which causes another exception from the txclient, which is the one that the user program sees.

        Attachments

          Activity

            People

            • Assignee:
              ashau Albert Shau
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: