Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-12360

Chance of deadlock in Logging in CDAP Master

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3.0
    • Component/s: Log, Master
    • Labels:
    • Release Notes:
      Fixed an issue where certain excessive logging could cause a deadlock in CDAP master.
    • Rank:
      1|i006hb:

      Description

      The logging system in CDAP master can enter a deadlock in certain cases. For instance, this was observed in a case where 60 scheduled workflows are launched alongside CDAP master startup.
      logback.xml was modified on this cluster to allow logging for 'org.apache.twill' at the INFO level, such that the following log line will be logged: https://github.com/apache/twill/blob/master/twill-yarn/src/main/java/org/apache/twill/internal/yarn/AbstractYarnProcessLauncher.java#L139

      Below are snippets of a thread dump of the CDAP master process.

      "kafka-log-publisher" #29 daemon prio=5 os_prio=0 tid=0x00007f38105e2000 nid=0x25fd5 waiting for monitor entry [0x00007f381f0f2000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at ch.qos.logback.core.AppenderBase.doAppend(AppenderBase.java:64)
              - waiting to lock <0x0000000715613508> (a co.cask.cdap.logging.appender.kafka.KafkaLogAppender)
              at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
              at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:280)
              at ch.qos.logback.classic.Logger.callAppenders(Logger.java:267)
              at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:449)
              at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:403)
              at ch.qos.logback.classic.Logger.log(Logger.java:803)
              at org.apache.commons.logging.impl.SLF4JLocationAwareLog.warn(SLF4JLocationAwareLog.java:185)
              at org.apache.hadoop.security.token.Token.getClassForIdentifier(Token.java:124)
              at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:138)
              at org.apache.hadoop.security.token.Token.identifierToString(Token.java:349)
              at org.apache.hadoop.security.token.Token.toString(Token.java:369)
              at co.cask.cdap.logging.serialize.LoggingEventSerializer.toGenericRecord(LoggingEventSerializer.java:151)
              at co.cask.cdap.logging.serialize.LoggingEventSerializer.toBytes(LoggingEventSerializer.java:68)
              at co.cask.cdap.logging.appender.kafka.KafkaLogAppender$KafkaLogPublisher.createKeyedMessage(KafkaLogAppender.java:239)
              at co.cask.cdap.logging.appender.kafka.KafkaLogAppender$KafkaLogPublisher.publishMessages(KafkaLogAppender.java:227)
              at co.cask.cdap.logging.appender.kafka.KafkaLogAppender$KafkaLogPublisher.run(KafkaLogAppender.java:128)
              at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52)
              at java.lang.Thread.run(Thread.java:745)
      
      " STARTING" #6050 daemon prio=5 os_prio=0 tid=0x00007f35b0fe5800 nid=0x27ba6 waiting for monitor entry [0x00007f31d26c2000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at ch.qos.logback.core.AppenderBase.doAppend(AppenderBase.java:64)
              - waiting to lock <0x0000000715613508> (a co.cask.cdap.logging.appender.kafka.KafkaLogAppender)
              at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
              at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:280)
              at ch.qos.logback.classic.Logger.callAppenders(Logger.java:267)
              at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:449)
              at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:403)
              at ch.qos.logback.classic.Logger.log(Logger.java:803)
              at org.apache.commons.logging.impl.SLF4JLocationAwareLog.warn(SLF4JLocationAwareLog.java:185)
              at org.apache.hadoop.security.token.Token.getClassForIdentifier(Token.java:124)
              at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:138)
              at org.apache.hadoop.security.token.Token.identifierToString(Token.java:349)
              at org.apache.hadoop.security.token.Token.toString(Token.java:369)
              at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:304)
              at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
              at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
              at ch.qos.logback.classic.spi.LoggingEvent.<init>(LoggingEvent.java:113)
              at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:447)
              at ch.qos.logback.classic.Logger.filterAndLog_1(Logger.java:421)
              at ch.qos.logback.classic.Logger.info(Logger.java:611)
              at org.apache.twill.internal.yarn.AbstractYarnProcessLauncher$PrepareLaunchContextImpl.launch(AbstractYarnProcessLauncher.java:139)
              at org.apache.twill.yarn.YarnTwillPreparer$2.call(YarnTwillPreparer.java:432)
              at org.apache.twill.yarn.YarnTwillPreparer$2.call(YarnTwillPreparer.java:393)
              at org.apache.twill.yarn.YarnTwillController.doStartUp(YarnTwillController.java:123)
              at org.apache.twill.internal.AbstractZKServiceController.startUp(AbstractZKServiceController.java:75)
              at org.apache.twill.internal.AbstractExecutionServiceController$ServiceDelegate.startUp(AbstractExecutionServiceController.java:187)
              at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43)
              at java.lang.Thread.run(Thread.java:745)
      

      A short term workaround is to include this line in the logback.xml:

      <logger name="org.apache.twill.internal.yarn.AbstractYarnProcessLauncher" level="WARN"/>
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ali.anwar Ali Anwar
                Reporter:
                ali.anwar Ali Anwar
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: