Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-4629

ETLMapReduce needs to allow specifying MR Driver Resources

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: CDAP, ETL, MapReduce
    • Labels:
      None
    • Rank:
      1|hzz573:

      Description

      An ETLMapReduce job fails because the driver is running out of memory (stacktrace below).
      In the PR (https://github.com/caskdata/cdap/pull/4488), we allow user to set memory resources for the driver. We should also expose such a setting for ETLMapReduce.

      2016-01-15T08:31:11,867Z ERROR c.c.c.i.a.r.w.WorkflowDriver [cdap-itn9-dsth23-47-5254-1002.dev.continuuity.net] [WorkflowDriver] WorkflowDriver:executeAction(WorkflowDriver.java:256) - Exception on Work
      flowAction.run(), aborting Workflow. DefaultWorkflowActionSpecification{className='co.cask.cdap.internal.workflow.ProgramWorkflowAction', name='ETLMapReduce', description='Workflow action for MAPREDUCE 
      ETLMapReduce', properties={ProgramType=MAPREDUCE, ProgramName=ETLMapReduce}, datasets=[]}
      java.lang.InterruptedException: null
              at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
              at java.util.concurrent.FutureTask.get(FutureTask.java:187)
              at co.cask.cdap.internal.app.runtime.workflow.WorkflowDriver.executeAction(WorkflowDriver.java:254)
              at co.cask.cdap.internal.app.runtime.workflow.WorkflowDriver.executeNode(WorkflowDriver.java:390)
              at co.cask.cdap.internal.app.runtime.workflow.WorkflowDriver.executeAll(WorkflowDriver.java:446)
              at co.cask.cdap.internal.app.runtime.workflow.WorkflowDriver.run(WorkflowDriver.java:434)
              at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52)
              at java.lang.Thread.run(Thread.java:745)
      2016-01-15T08:31:12,687Z INFO  o.a.t.i.a.ApplicationMasterService [cdap-itn9-dsth23-47-5254-1002.dev.continuuity.net] [ApplicationMasterService] ApplicationMasterService:handleCompleted(ApplicationMasterService.java:440) - Container container_1452844468607_0028_01_000015 completed with COMPLETE:Container [pid=32074,containerID=container_1452844468607_0028_01_000015] is running beyond physical memory limits. Current usage: 516.4 MB of 512 MB physical memory used; 949.0 MB of 2.5 GB virtual memory used. Killing container.
      Dump of the process-tree for container_1452844468607_0028_01_000015 :
              |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
              |- 32074 32072 32074 32074 (bash) 0 0 9461760 278 /bin/bash -c /usr/lib/jvm/java/bin/java -Djava.io.tmpdir=tmp -Dyarn.container=container_1452844468607_0028_01_000015 -Dtwill.runnable=workflow.default.TPFSToTPFSWithProjection.ETLWorkflow.ETLWorkflow -cp launcher.jar:/etc/hadoop/conf -Xmx359m -XX:MaxPermSize=128M -verbose:gc -Xloggc:/data/logs/hadoop-yarn/userlogs/application_1452844468607_0028/container_1452844468607_0028_01_000015/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Dhdp.version=2.3.2.0-2950 -Dspark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.2.0-2950 org.apache.twill.launcher.TwillLauncher container.jar org.apache.twill.internal.container.TwillContainerMain true 1>/data/logs/hadoop-yarn/userlogs/application_1452844468607_0028/container_1452844468607_0028_01_000015/stdout 2>/data/logs/hadoop-yarn/userlogs/application_1452844468607_0028/container_1452844468607_0028_01_000015/stderr 
              |- 32087 32074 32074 32074 (java) 2311 1074 985661440 131932 /usr/lib/jvm/java/bin/java -Djava.io.tmpdir=tmp -Dyarn.container=container_1452844468607_0028_01_000015 -Dtwill.runnable=workflow.default.TPFSToTPFSWithProjection.ETLWorkflow.ETLWorkflow -cp launcher.jar:/etc/hadoop/conf -Xmx359m -XX:MaxPermSize=128M -verbose:gc -Xloggc:/data/logs/hadoop-yarn/userlogs/application_1452844468607_0028/container_1452844468607_0028_01_000015/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Dhdp.version=2.3.2.0-2950 -Dspark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.2.0-2950 org.apache.twill.launcher.TwillLauncher container.jar org.apache.twill.internal.container.TwillContainerMain true 
      
      Container killed on request. Exit code is 143
      Container exited with a non-zero exit code 143
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ashau Albert Shau
                Reporter:
                ali.anwar Ali Anwar
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: