Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-10100

Errors trying to use NaiveBayesClassifier

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.3.0, 3.5.0
    • Fix Version/s: 4.3.1
    • Labels:
      None
    • Rank:
      1|hzzjx3:

      Description

      Seeing this error with a pipeline that worked on 3.4.3. It is thrown from the EventClassifier pipeline after training with the Trainer pipeline.

      java.util.concurrent.ExecutionException: co.cask.tephra.TransactionFailureException: Exception raised in transactional execution. Cause: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 5, localhost): org.apache.parquet.hadoop.BadConfigurationException: class org.apache.spark.sql.execution.datasources.parquet.CatalystReadSupport set in job conf at parquet.read.support.class is not a subclass of org.apache.parquet.hadoop.api.ReadSupport
      	at org.apache.parquet.hadoop.util.ConfigurationUtil.getClassFromConfig(ConfigurationUtil.java:35)
      	at org.apache.parquet.hadoop.ParquetInputFormat.getReadSupportClass(ParquetInputFormat.java:177)
      	at org.apache.parquet.hadoop.ParquetInputFormat.getReadSupport(ParquetInputFormat.java:252)
      	at org.apache.parquet.hadoop.ParquetInputFormat.createRecordReader(ParquetInputFormat.java:240)
      	at org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.<init>(SqlNewHadoopRDD.scala:178)
      	at org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:126)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      	at org.apache.spark.scheduler.Task.run(Task.scala:89)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      
      Driver stacktrace:
      	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294) ~[com.google.guava.guava-13.0.1.jar:na]
      	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:281) ~[com.google.guava.guava-13.0.1.jar:na]
      	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[com.google.guava.guava-13.0.1.jar:na]
      	at co.cask.cdap.app.runtime.spark.SparkRuntimeService.run(SparkRuntimeService.java:262) ~[co.cask.cdap.cdap-spark-core-3.5.0-SNAPSHOT.jar:na]
      	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~[com.google.guava.guava-13.0.1.jar:na]
      	at co.cask.cdap.app.runtime.spark.SparkRuntimeService$3$1.run(SparkRuntimeService.java:316) [co.cask.cdap.cdap-spark-core-3.5.0-SNAPSHOT.jar:na]
      	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
      Caused by: co.cask.tephra.TransactionFailureException: Exception raised in transactional execution. Cause: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 5, localhost): org.apache.parquet.hadoop.BadConfigurationException: class org.apache.spark.sql.execution.datasources.parquet.CatalystReadSupport set in job conf at parquet.read.support.class is not a subclass of org.apache.parquet.hadoop.api.ReadSupport
      	at org.apache.parquet.hadoop.util.ConfigurationUtil.getClassFromConfig(ConfigurationUtil.java:35)
      	at org.apache.parquet.hadoop.ParquetInputFormat.getReadSupportClass(ParquetInputFormat.java:177)
      	at org.apache.parquet.hadoop.ParquetInputFormat.getReadSupport(ParquetInputFormat.java:252)
      	at org.apache.parquet.hadoop.ParquetInputFormat.createRecordReader(ParquetInputFormat.java:240)
      	at org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.<init>(SqlNewHadoopRDD.scala:178)
      	at org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:126)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      	at org.apache.spark.scheduler.Task.run(Task.scala:89)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      
      Driver stacktrace:
      	at co.cask.cdap.data2.transaction.Transactions.asTransactionFailure(Transactions.java:87) ~[co.cask.cdap.cdap-data-fabric-3.5.0-SNAPSHOT.jar:na]
      	at co.cask.cdap.data2.transaction.Transactions.asTransactionFailure(Transactions.java:73) ~[co.cask.cdap.cdap-data-fabric-3.5.0-SNAPSHOT.jar:na]
      	at co.cask.cdap.app.runtime.spark.SparkTransactional.execute(SparkTransactional.java:215) ~[na:na]
      	at co.cask.cdap.app.runtime.spark.SparkTransactional.execute(SparkTransactional.java:132) ~[na:na]
      	at co.cask.cdap.app.runtime.spark.DefaultSparkExecutionContext.execute(DefaultSparkExecutionContext.scala:150) ~[na:na]
      	at co.cask.cdap.app.runtime.spark.DefaultJavaSparkExecutionContext.execute(DefaultJavaSparkExecutionContext.scala:79) ~[na:na]
      	at co.cask.cdap.etl.spark.batch.BatchSparkPipelineDriver.run(BatchSparkPipelineDriver.java:78) ~[na:na]
      	at co.cask.cdap.app.runtime.spark.SparkMainWrapper$.main(SparkMainWrapper.scala:96) ~[na:na]
      	at co.cask.cdap.app.runtime.spark.SparkMainWrapper.main(SparkMainWrapper.scala) ~[na:na]
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_79]
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_79]
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_79]
      	at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_79]
      	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) ~[na:na]
      	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) ~[na:na]
      	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) ~[na:na]
      	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) ~[na:na]
      	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ~[na:na]
      	at co.cask.cdap.app.runtime.spark.submit.AbstractSparkSubmitter.submit(AbstractSparkSubmitter.java:172) ~[na:na]
      	at co.cask.cdap.app.runtime.spark.submit.AbstractSparkSubmitter.access$000(AbstractSparkSubmitter.java:56) ~[na:na]
      	at co.cask.cdap.app.runtime.spark.submit.AbstractSparkSubmitter$5.run(AbstractSparkSubmitter.java:114) ~[na:na]
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_79]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_79]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_79]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_79]
      	... 1 common frames omitted
      Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 5, localhost): org.apache.parquet.hadoop.BadConfigurationException: class org.apache.spark.sql.execution.datasources.parquet.CatalystReadSupport set in job conf at parquet.read.support.class is not a subclass of org.apache.parquet.hadoop.api.ReadSupport
      	at org.apache.parquet.hadoop.util.ConfigurationUtil.getClassFromConfig(ConfigurationUtil.java:35)
      	at org.apache.parquet.hadoop.ParquetInputFormat.getReadSupportClass(ParquetInputFormat.java:177)
      	at org.apache.parquet.hadoop.ParquetInputFormat.getReadSupport(ParquetInputFormat.java:252)
      	at org.apache.parquet.hadoop.ParquetInputFormat.createRecordReader(ParquetInputFormat.java:240)
      	at org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.<init>(SqlNewHadoopRDD.scala:178)
      	at org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:126)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      	at org.apache.spark.scheduler.Task.run(Task.scala:89)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      
      Driver stacktrace:
      	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) ~[na:na]
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) ~[na:na]
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) ~[na:na]
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) ~[org.scala-lang.scala-library-2.10.4.jar:na]
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) ~[org.scala-lang.scala-library-2.10.4.jar:na]
      	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) ~[na:na]
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) ~[na:na]
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) ~[na:na]
      	at scala.Option.foreach(Option.scala:236) ~[org.scala-lang.scala-library-2.10.4.jar:na]
      	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) ~[na:na]
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) ~[na:na]
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) ~[na:na]
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) ~[na:na]
      	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) ~[na:na]
      	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) ~[na:na]
      	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) ~[na:na]
      	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) ~[na:na]
      	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) ~[na:na]
      	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:212) ~[na:na]
      	at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165) ~[na:na]
      	at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) ~[na:na]
      	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499) ~[na:na]
      	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499) ~[na:na]
      	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) ~[na:na]
      	at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086) ~[na:na]
      	at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498) ~[na:na]
      	at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505) ~[na:na]
      	at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375) ~[na:na]
      	at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1374) ~[na:na]
      	at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099) ~[na:na]
      	at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1374) ~[na:na]
      	at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1456) ~[na:na]
      	at org.apache.spark.mllib.classification.NaiveBayesModel$SaveLoadV2_0$.load(NaiveBayes.scala:216) ~[na:na]
      	at org.apache.spark.mllib.classification.NaiveBayesModel$.load(NaiveBayes.scala:283) ~[na:na]
      	at org.apache.spark.mllib.classification.NaiveBayesModel.load(NaiveBayes.scala) ~[na:na]
      	at co.cask.hydrator.plugin.batch.spark.NaiveBayesClassifier.transform(NaiveBayesClassifier.java:138) ~[na:na]
      	at co.cask.cdap.etl.spark.batch.RDDCollection.compute(RDDCollection.java:94) ~[na:na]
      	at co.cask.cdap.etl.spark.SparkPipelineDriver.runPipeline(SparkPipelineDriver.java:122) ~[na:na]
      	at co.cask.cdap.etl.spark.batch.BatchSparkPipelineDriver.run(BatchSparkPipelineDriver.java:93) ~[na:na]
      	at co.cask.cdap.app.runtime.spark.SparkTransactional$2.run(SparkTransactional.java:224) ~[na:na]
      	at co.cask.cdap.app.runtime.spark.SparkTransactional.execute(SparkTransactional.java:197) ~[na:na]
      	... 23 common frames omitted
      Caused by: org.apache.parquet.hadoop.BadConfigurationException: class org.apache.spark.sql.execution.datasources.parquet.CatalystReadSupport set in job conf at parquet.read.support.class is not a subclass of org.apache.parquet.hadoop.api.ReadSupport
      	at org.apache.parquet.hadoop.util.ConfigurationUtil.getClassFromConfig(ConfigurationUtil.java:35) ~[na:na]
      	at org.apache.parquet.hadoop.ParquetInputFormat.getReadSupportClass(ParquetInputFormat.java:177) ~[na:na]
      	at org.apache.parquet.hadoop.ParquetInputFormat.getReadSupport(ParquetInputFormat.java:252) ~[na:na]
      	at org.apache.parquet.hadoop.ParquetInputFormat.createRecordReader(ParquetInputFormat.java:240) ~[na:na]
      	at org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.<init>(SqlNewHadoopRDD.scala:178) ~[na:na]
      	at org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:126) ~[na:na]
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) ~[na:na]
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) ~[na:na]
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[na:na]
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) ~[na:na]
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) ~[na:na]
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ~[na:na]
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) ~[na:na]
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) ~[na:na]
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) ~[na:na]
      	at org.apache.spark.scheduler.Task.run(Task.scala:89) ~[na:na]
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) ~[na:na]
      	... 3 common frames omitted
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sree Sreevatsan Raman
                Reporter:
                russellsavage Russ Savage
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: