Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-8007

RegionServer cannot recover after restart

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.0.0
    • Fix Version/s: 4.0.1
    • Component/s: None
    • Labels:
      None
    • Release Notes:
      Fixed a bug in the TMS (Transaction Messaging Service) message and payload table coprocessors by changing the accessing of CDAP configuration and TMS metadata tables from reading them inline to reading them in a separate thread.
    • Rank:
      1|hzzrp3:

      Description

      RegionServer fails to recover after a restart.

      These error eventually lead to regionserver failure.

      17/01/03 22:35:02 INFO client.RpcRetryingCaller: Call exception, tries=63, retries=350, started=1133571 ms ago, cancelled=false, msg=row 'DEFAULT' on table 'cdap_system:configuration' at region=cdap_system:configuration,,1483480754148.c5b89744bf61ee0a79cba2a916
      fab5fe., hostname=<>, seqNum=2
      

      This is the fatal errors:

      17/01/03 22:36:10 FATAL regionserver.HRegionServer: ABORTING region server <>: The coprocessor co.cask.cdap.data2.transaction.messaging.coprocessor.hbase11.MessageTableRegionObserver threw java.io.InterruptedIOException
      java.io.InterruptedIOException
              at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:199)
              at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
              at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
              at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211)
              at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185)
              at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1255)
              at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1161)
              at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1132)
              at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1116)
              at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:937)
              at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
              at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:79)
              at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
              at org.apache.hadoop.hbase.client.HTable.get(HTable.java:889)
              at org.apache.hadoop.hbase.client.HTable.get(HTable.java:855)
              at co.cask.cdap.data2.util.hbase.ConfigurationTable.read(ConfigurationTable.java:135)
              at co.cask.cdap.data2.transaction.queue.hbase.coprocessor.CConfigurationReader.read(CConfigurationReader.java:39)
              at co.cask.cdap.data2.transaction.messaging.coprocessor.hbase11.MessageTableRegionObserver.start(MessageTableRegionObserver.java:106)
              at org.apache.hadoop.hbase.coprocessor.CoprocessorHost$Environment.startup(CoprocessorHost.java:411)
      
      

        Attachments

          Activity

            People

            • Assignee:
              gokul Gokul Gunasekaran
              Reporter:
              deepak Deepak Wadhwani
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: