Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-14115

WikipediaPipelineWorkflow fails on Spark2

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 5.1.0
    • Fix Version/s: 5.1.0
    • Component/s: CDAP Examples
    • Labels:
      None
    • Rank:
      1|i00gjz:

      Description

      I ran the WikipediaPipelineWorkflow per documentation: https://docs.cdap.io/cdap/5.0.0/en/examples-manual/examples/wikipedia-data-pipeline.html

      The workflow fail with the following logs:

      2018-08-14 16:25:30,433 - INFO  [Thread-212:c.c.c.i.a.r.b.MainOutputCommitter@94] - Setting up for MapReduce job: namespaceId=default, applicationId=WikipediaPipeline, program=WikiContentValidatorAndNormalizer, runid=554c2405-a019-11e8-aecb-acde48001122
      2018-08-14 16:25:30,887 - INFO  [Thread-212:c.c.c.i.a.r.b.MainOutputCommitter@181] - Invalidating transaction 1534289130434000000
      2018-08-14 16:25:30,890 - WARN  [Thread-212:o.a.h.m.LocalJobRunner@587] - job_local1769443146_0003
      java.lang.Exception: java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtils
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489) ~[org.apache.hadoop.hadoop-mapreduce-client-common-2.8.0.jar:na]
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549) ~[org.apache.hadoop.hadoop-mapreduce-client-common-2.8.0.jar:na]
      java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtils
      	at org.sweble.wikitext.engine.PageTitle.make(PageTitle.java:339) ~[swc-engine-2.0.0.jar:2.0.0]
      	at org.sweble.wikitext.engine.PageTitle.make(PageTitle.java:287) ~[swc-engine-2.0.0.jar:2.0.0]
      	at co.cask.cdap.examples.wikipedia.WikiContentValidatorAndNormalizer$FilterNormalizerMapper.toPlainText(WikiContentValidatorAndNormalizer.java:162) ~[unpacked/:na]
      	at co.cask.cdap.examples.wikipedia.WikiContentValidatorAndNormalizer$FilterNormalizerMapper.map(WikiContentValidatorAndNormalizer.java:131) ~[unpacked/:na]
      	at co.cask.cdap.examples.wikipedia.WikiContentValidatorAndNormalizer$FilterNormalizerMapper.map(WikiContentValidatorAndNormalizer.java:92) ~[unpacked/:na]
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.8.0.jar:na]
      	at co.cask.cdap.internal.app.runtime.batch.MapperWrapper.run(MapperWrapper.java:135) ~[na:na]
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.8.0.jar:na]
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.8.0.jar:na]
      	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270) ~[org.apache.hadoop.hadoop-mapreduce-client-common-2.8.0.jar:na]
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_151]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_151]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_151]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_151]
      	at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]
      Caused by: java.lang.ClassNotFoundException: org.apache.commons.lang.StringUtils
      	at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[na:1.8.0_151]
      	at co.cask.cdap.common.lang.InterceptableClassLoader.findClass(InterceptableClassLoader.java:46) ~[na:na]
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.8.0_151]
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.8.0_151]
      	... 15 common frames omitted 

      On occasion, I've seen it complete successfully (no errors in the logs either), but with a discrepancy in the number of records. With Spark1, it outputs 10 records, but with Spark2, it only outputs 2 records.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sree Sreevatsan Raman
                Reporter:
                ali.anwar Ali Anwar
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: