Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-14929

Pipelines fail to run with dataproc profile

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 6.0.0
    • Fix Version/s: 6.0.0
    • Component/s: Cloud Provisioner
    • Labels:
      None
    • Rank:
      1|i00l8v:

      Description

      Running a GCS to GCS pipeline using a Dataproc profile fails.
      Snippet from master logs about the program:

      2019-02-15 13:12:50,679 - INFO  [appfabric-executor-7:c.c.c.i.a.s.ProgramLifecycleService@410] - Attempt to run Workflow program DataPipelineWorkflow as user cdapitn
      2019-02-15 13:15:07,510 - WARN  [program.status:c.c.c.i.a.r.d.DistributedProgramRuntimeService@177] - Twill RunId does not exist for the program program:default.GCSCopyTest.-SNAPSHOT.workflow.DataPipelineWorkflow, runId 657
      0d2e2-3123-11e9-b692-42010ad20017
      2019-02-15 13:15:20,299 - INFO  [program.status:c.c.c.i.a.r.d.DistributedProgramRunner@472] - Starting Workflow Program 'DataPipelineWorkflow' with Arguments [logical.start.time=1550236370678, system.profile.name=dataproc-i
      tn-profile]
      2019-02-15 13:15:20,299 - INFO  [program.status:c.c.c.i.a.r.d.DistributedProgramRunner$1@292] - Starting program:default.GCSCopyTest.-SNAPSHOT.workflow.DataPipelineWorkflow with debugging enabled: false, logback: file:/var/
      tmp/cdap/data/tmp/1550236507754-0/logback.xml
      2019-02-15 13:15:23,772 - INFO  [LogPipeline-cdap:c.c.c.l.a.s.LogFileManager@102] - Created Avro file at TimeStampLocation{location=hdfs://10.210.0.23:8020/cdap/logs/default/2019-02-15/GCSCopyTest/DataPipelineWorkflow/15502
      36523742.avro, timeStamp=1550236523742}
      2019-02-15 13:15:44,444 - INFO  [program.status:c.c.c.i.a.r.d.r.RemoteExecutionTwillPreparer@456] - Starting runnable DataPipelineWorkflow with SSH on 10.210.0.31
      2019-02-15 13:15:44,580 - INFO  [program.status:c.c.c.i.a.r.d.AbstractTwillProgramController$1@69] - Twill program running: program_run:default.GCSCopyTest.-SNAPSHOT.workflow.DataPipelineWorkflow.6570d2e2-3123-11e9-b692-42010ad20017, twill runId: 6570d2e2-3123-11e9-b692-42010ad20017
      2019-02-15 13:16:08,012 - INFO  [LogPipeline-cdap:c.c.c.l.a.s.LogFileManager@102] - Created Avro file at TimeStampLocation{location=hdfs://10.210.0.23:8020/cdap/logs/system/2019-02-15/services/runtime.http/1550236567992.avro, timeStamp=1550236567992}
      2019-02-15 13:16:19,048 - INFO  [runtime-monitor-10:c.c.c.i.a.r.d.r.SSHRemoteProcessController@73] - Received exit code 1 when checking for remote process for program run program_run:default.GCSCopyTest.-SNAPSHOT.workflow.DataPipelineWorkflow.6570d2e2-3123-11e9-b692-42010ad20017.
      2019-02-15 13:16:19,102 - INFO  [runtime-monitor-4:c.c.c.i.a.r.d.AbstractTwillProgramController$2@77] - Twill program terminated: program_run:default.GCSCopyTest.-SNAPSHOT.workflow.DataPipelineWorkflow.6570d2e2-3123-11e9-b692-42010ad20017, twill runId: 6570d2e2-3123-11e9-b692-42010ad20017 

       

      Snippet from program.log on the dataproc cluster (after the process on the dataproc cluster terminates):

      2019-02-15 23:13:32,184 - INFO  [CoreMessagingService STARTING:c.c.c.m.s.CoreMessagingService@232] - Core Messaging Service started
      2019-02-15 23:13:32,184 - INFO  [main:o.a.t.TransactionManager@242] - Starting transaction manager.
      2019-02-15 23:13:32,194 - INFO  [main:o.a.t.TransactionManager@284] - Starting periodic timed-out transaction cleanup every 10 seconds with default timeout of 30 seconds.
      2019-02-15 23:13:32,195 - INFO  [main:o.a.t.TransactionManager@302] - Starting periodic snapshot thread, frequency = 60 seconds, location = no-op
      2019-02-15 23:13:32,196 - INFO  [main:o.a.t.TransactionManager@339] - Starting periodic Metrics Emitter thread, frequency = 10000
      2019-02-15 23:13:32,218 - INFO  [MessagingHttpService STARTING:c.c.h.NettyHttpService@173] - Starting HTTP Service messaging.service at address cdap-pipe-ed5c4561-3176-11e9-a7f5-42010ad20046-m.c.cask-audi-clusters.internal/10.210.0.58:0
      2019-02-15 23:13:32,886 - INFO  [MessagingHttpService STARTING:c.c.c.m.s.MessagingHttpService@101] - Messaging HTTP server started on cdap-pipe-ed5c4561-3176-11e9-a7f5-42010ad20046-m.c.cask-audi-clusters.internal/10.210.0.58:44405
      2019-02-15 23:13:33,314 - INFO  [RuntimeMonitorServer STARTING:c.c.h.NettyHttpService@173] - Starting HTTP Service runtime.http at address localhost/127.0.0.1:0
      2019-02-15 23:13:33,415 - INFO  [RuntimeMonitorServer STARTING:c.c.c.i.a.r.m.RuntimeMonitorServer@128] - Runtime monitor server started on /127.0.0.1:40959
      2019-02-15 23:13:33,426 - INFO  [DatasetService:c.c.c.d.d.d.s.DatasetService@128] - Starting DatasetService... 

       

      Because the last log is from DatasetService, that is the likely culprit of an error.

       

      Suspect PR: https://github.com/cdapio/cdap/pull/11035

        Attachments

          Activity

            People

            • Assignee:
              terence Terence Yim
              Reporter:
              ali.anwar Ali Anwar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: