Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-11765

Spark programs that use FileSystem fail to run on some clusters

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.2.0
    • Fix Version/s: 4.2.0
    • Component/s: None
    • Labels:
      None
    • Rank:
      1|i002yv:

      Description

      The SparkPageRank program is failing to run on some cdh versions with:

      017-06-01 11:11:12,842 - ERROR [Executor task launch worker-0:o.a.s.e.Executor@96] - Exception in task 1.3 in stage 0.0 (TID 7)
      java.io.IOException: No FileSystem for scheme: hdfs
      	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) ~[hadoop-common-2.5.0-cdh5.3.10.jar:na]
      	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) ~[hadoop-common-2.5.0-cdh5.3.10.jar:na]
      	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) ~[hadoop-common-2.5.0-cdh5.3.10.jar:na]
      	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) ~[hadoop-common-2.5.0-cdh5.3.10.jar:na]
      	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) ~[hadoop-common-2.5.0-cdh5.3.10.jar:na]
      	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) ~[hadoop-common-2.5.0-cdh5.3.10.jar:na]
      	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) ~[hadoop-common-2.5.0-cdh5.3.10.jar:na]
      	at org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatistics(SparkHadoopUtil.scala:175) ~[spark-assembly.jar:na]
      	at org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:138) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:116) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) ~[spark-assembly.jar:na]
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) ~[spark-assembly.jar:na]
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) ~[spark-assembly.jar:na]
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) ~[spark-assembly.jar:na]
      	at org.apache.spark.scheduler.Task.run(Task.scala:56) ~[spark-assembly.jar:na]
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) ~[spark-assembly.jar:na]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75]
      	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
      

      This is happening on cdh 5.3, 5.4, hdp 2.2, 2.3

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                terence Terence Yim
                Reporter:
                ashau Albert Shau
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: