Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-14931

App deployment fails intermittently

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 6.0.0
    • Fix Version/s: 6.0.0
    • Component/s: Applications, Scheduler
    • Labels:
      None
    • Rank:
      1|i00l9b:

      Description

      The first application deployment soon after CDAP master startup can fail with the stack trace below. This is witnessed in integration tests, which check that all system services are OK before attempting to deploy any application. However, apparently, the core scheduler service hasn't yet been started completely even after verifying that all system services return OK. This behavior has not been witnessed until recently (~2/14/19), so it is possible that the startup of core scheduler service is just taking longer or has been delayed.

      2019-02-15 06:14:58,048 - ERROR [appfabric-executor-16:c.c.c.g.h.AppLifecycleHttpHandler$2@503] - Deploy failure
      co.cask.cdap.common.ServiceUnavailableException: Service 'Core scheduler' is not available. Please wait until it is up and running.
              at co.cask.cdap.scheduler.CoreSchedulerService.checkStarted(CoreSchedulerService.java:213) ~[na:na]
              at co.cask.cdap.scheduler.CoreSchedulerService.listSchedules(CoreSchedulerService.java:507) ~[na:na]
              at co.cask.cdap.internal.app.deploy.pipeline.DeleteAndCreateSchedulesStage.process(DeleteAndCreateSchedulesStage.java:56) ~[na:na]
              at co.cask.cdap.internal.app.deploy.pipeline.DeleteAndCreateSchedulesStage.process(DeleteAndCreateSchedulesStage.java:35) ~[na:na]
              at co.cask.cdap.pipeline.AbstractStage.process(AbstractStage.java:53) ~[na:na]
              at co.cask.cdap.internal.pipeline.SynchronousPipeline.execute(SynchronousPipeline.java:57) ~[na:na]
              at co.cask.cdap.internal.app.deploy.LocalApplicationManager.deploy(LocalApplicationManager.java:132) ~[na:na]
              at co.cask.cdap.internal.app.services.ApplicationLifecycleService.deployApp(ApplicationLifecycleService.java:668) ~[na:na]
              at co.cask.cdap.internal.app.services.ApplicationLifecycleService.deployAppAndArtifact(ApplicationLifecycleService.java:358) ~[na:na]
              at co.cask.cdap.gateway.handlers.AppLifecycleHttpHandler$2.onFinish(AppLifecycleHttpHandler.java:478) ~[na:na]
              at co.cask.cdap.common.http.AbstractBodyConsumer.finished(AbstractBodyConsumer.java:65) [na:na]
              at co.cask.http.internal.HttpMethodInfo.bodyConsumerFinish(HttpMethodInfo.java:151) [co.cask.http.netty-http-1.1.0.jar:na]
              at co.cask.http.internal.HttpMethodInfo.chunk(HttpMethodInfo.java:112) [co.cask.http.netty-http-1.1.0.jar:na]
              at co.cask.http.internal.HttpDispatcher.channelRead(HttpDispatcher.java:47) [co.cask.http.netty-http-1.1.0.jar:na]
              at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
              at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
              at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
              at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
              at co.cask.http.internal.NonStickyEventExecutorGroup$NonStickyOrderedEventExecutor.run(NonStickyEventExecutorGroup.java:254) [co.cask.http.netty-http-1.1.0.jar:na]
              at io.netty.util.concurrent.UnorderedThreadPoolEventExecutor$NonNotifyRunnable.run(UnorderedThreadPoolEventExecutor.java:277) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_181]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_181]
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_181]
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_181]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
              at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181] 

       

      The failure first happened in https://builds.cask.co/browse/IT-ITN-238, so the suspect PRs (based on merge time) are:
      https://github.com/cdapio/cdap/pull/11035

      https://github.com/cdapio/cdap/pull/11067

      https://github.com/cdapio/cdap/pull/11062

      https://github.com/cdapio/cdap/pull/11031

       

       

        Attachments

          Activity

            People

            • Assignee:
              ali.anwar Ali Anwar
              Reporter:
              ali.anwar Ali Anwar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: