Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-8913

Can't use Table's Row as the type of RDD in Spark

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.1.0
    • Fix Version/s: 4.1.1
    • Component/s: Datasets, Spark
    • Labels:
      None
    • Release Notes:
      Improved the serializability of Tables and IndexedTables when used in Spark programs.
    • Rank:
      1|hzzxcf:

      Description

      If you read a Table dataset in a Spark program, the RDD value is Row, and the implementation of Row that gets used is Result.
      Because Result is not Serializable, the Row can not be serialized across Spark executors (for instance, if you're performing a Join).

      2017-03-08 11:32:18,739 - ERROR [Executor task launch worker-1:o.a.s.Logging$class@95] - Exception in task 3.0 in stage 0.0 (TID 3)
      java.io.NotSerializableException: co.cask.cdap.api.dataset.table.Result
      Serialization stack:
      	- object not serializable (class: co.cask.cdap.api.dataset.table.Result, value: co.cask.cdap.api.dataset.table.Result@76363e6a)
      	at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) ~[spark-core_2.10-1.6.1.jar:1.6.1]
      	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) ~[spark-core_2.10-1.6.1.jar:1.6.1]
      	at org.apache.spark.serializer.SerializationStream.writeValue(Serializer.scala:147) ~[spark-core_2.10-1.6.1.jar:1.6.1]
      	at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:185) ~[spark-core_2.10-1.6.1.jar:1.6.1]
      	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151) ~[spark-core_2.10-1.6.1.jar:1.6.1]
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) ~[spark-core_2.10-1.6.1.jar:1.6.1]
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) ~[spark-core_2.10-1.6.1.jar:1.6.1]
      	at org.apache.spark.scheduler.Task.run(Task.scala:89) ~[spark-core_2.10-1.6.1.jar:1.6.1]
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) ~[spark-core_2.10-1.6.1.jar:1.6.1]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
      	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
      

        Attachments

          Activity

            People

            • Assignee:
              ali.anwar Ali Anwar
              Reporter:
              ali.anwar Ali Anwar
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: