Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-12523

Scheduler service migration failure can block indefinitely

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.3.0, 4.2.0
    • Fix Version/s: 4.3.1
    • Component/s: Master, Scheduler
    • Labels:
    • Rank:
      1|i007fj:

      Description

      Running the 4.3RC.
      I set the log level to debug and can see this repeatedly:

      2017-08-29 14:54:47,171 - DEBUG [Endure-Service-:c.c.c.c.s.RetryOnStartFailureService$1@80] - Exception raised when starting service 
      java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: Invalid cron entry format
              at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294) ~[com.google.guava.guava-13.0.1.jar:na]
              at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281) ~[com.google.guava.guava-13.0.1.jar:na]
              at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[com.google.guava.guava-13.0.1.jar:na]
              at co.cask.cdap.common.service.RetryOnStartFailureService$1.run(RetryOnStartFailureService.java:73) ~[na:na]
      java.lang.IllegalArgumentException: Invalid cron entry format
              at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) ~[com.google.guava.guava-13.0.1.jar:na]
              at co.cask.cdap.internal.app.runtime.schedule.store.Schedulers.getQuartzCronExpression(Schedulers.java:189) ~[na:na]
              at co.cask.cdap.internal.app.runtime.schedule.store.Schedulers.validateCronExpression(Schedulers.java:173) ~[na:na]
              at co.cask.cdap.internal.app.runtime.schedule.trigger.TimeTrigger.validate(TimeTrigger.java:35) ~[na:na]
              at co.cask.cdap.proto.ProtoTrigger$TimeTrigger.<init>(ProtoTrigger.java:95) ~[na:na]
              at co.cask.cdap.internal.app.runtime.schedule.trigger.TimeTrigger.<init>(TimeTrigger.java:29) ~[na:na]
              at co.cask.cdap.internal.app.runtime.schedule.store.Schedulers.toProgramSchedule(Schedulers.java:126) ~[na:na]
              at co.cask.cdap.internal.app.runtime.schedule.store.ProgramScheduleStoreDataset.migrateFromAppMetadataStore(ProgramScheduleStoreDataset.java:141) ~[na:na]
              at co.cask.cdap.scheduler.CoreSchedulerService$5.run(CoreSchedulerService.java:201) ~[na:na]
              at co.cask.cdap.scheduler.CoreSchedulerService$5.run(CoreSchedulerService.java:198) ~[na:na]
              at co.cask.cdap.scheduler.CoreSchedulerService$18.call(CoreSchedulerService.java:502) ~[na:na]
              at co.cask.cdap.data2.transaction.Transactions$4.run(Transactions.java:262) ~[na:na]
              at co.cask.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:235) ~[na:na]
              at co.cask.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:223) ~[na:na]
              at co.cask.cdap.data2.transaction.Transactions$5.executeInternal(Transactions.java:295) ~[na:na]
              at co.cask.cdap.data2.transaction.Transactions$5.execute(Transactions.java:282) ~[na:na]
              at co.cask.cdap.data2.transaction.Transactions.execute(Transactions.java:259) ~[na:na]
              at co.cask.cdap.scheduler.CoreSchedulerService.execute(CoreSchedulerService.java:498) ~[na:na]
              at co.cask.cdap.scheduler.CoreSchedulerService.migrateSchedules(CoreSchedulerService.java:198) ~[na:na]
              at co.cask.cdap.scheduler.CoreSchedulerService.access$000(CoreSchedulerService.java:75) ~[na:na]
              at co.cask.cdap.scheduler.CoreSchedulerService$1$1.startUp(CoreSchedulerService.java:112) ~[na:na]
              at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43) ~[com.google.guava.guava-13.0.1.jar:na]
              at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66]
      

      The reason is that some app spec's schedules have cronExpression as empty string.
      The migration will then repeatedly retry and nothing will be able to leverage the scheduler service because it will not actually go into RUNNING state.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mao Chengfeng Mao
                Reporter:
                ali.anwar Ali Anwar
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: