Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-9883

Spark datapipeline doesn't support argument value > 64K long

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.0, 3.5.2
    • Component/s: Pipelines, Spark
    • Labels:
      None
    • Release Notes:
      Fixed a problem with Spark data pipelines not supporting argument values in excess of 64K characters.
    • Rank:
      1|hzzlr3:

      Description

      We should use a serialization format that doesn't have limit on the string length.

      org.apache.tephra.TransactionFailureException: Failed to execute method co.cask.cdap.etl.spark.batch.ETLSpark.beforeSubmit() inside a transaction
      	at co.cask.cdap.data2.transaction.Transactions.execute(Transactions.java:177) ~[co.cask.cdap.cdap-data-fabric-3.5.0.jar:na]
      	at co.cask.cdap.app.runtime.spark.SparkRuntimeService.beforeSubmit(SparkRuntimeService.java:349) ~[co.cask.cdap.cdap-spark-core-3.5.0.jar:na]
      	at co.cask.cdap.app.runtime.spark.SparkRuntimeService.startUp(SparkRuntimeService.java:159) ~[co.cask.cdap.cdap-spark-core-3.5.0.jar:na]
      	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) ~[com.google.guava.guava-13.0.1.jar:na]
      	at co.cask.cdap.app.runtime.spark.SparkRuntimeService$3$1.run(SparkRuntimeService.java:330) [co.cask.cdap.cdap-spark-core-3.5.0.jar:na]
      	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]
      Caused by: java.io.UTFDataFormatException: encoded string too long: 81332 bytes
      	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:364) ~[na:1.7.0_67]
      	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323) ~[na:1.7.0_67]
      	at co.cask.cdap.etl.spark.batch.Serializations$2.write(Serializations.java:62) ~[na:na]
      	at co.cask.cdap.etl.spark.batch.Serializations$2.write(Serializations.java:59) ~[na:na]
      	at co.cask.cdap.etl.spark.batch.Serializations.serializeMap(Serializations.java:36) ~[na:na]
      	at co.cask.cdap.etl.spark.batch.DatasetInfo.serialize(DatasetInfo.java:71) ~[na:na]
      	at co.cask.cdap.etl.spark.batch.SparkBatchSinkFactory$4.write(SparkBatchSinkFactory.java:131) ~[na:na]
      	at co.cask.cdap.etl.spark.batch.SparkBatchSinkFactory$4.write(SparkBatchSinkFactory.java:128) ~[na:na]
      	at co.cask.cdap.etl.spark.batch.Serializations.serializeMap(Serializations.java:36) ~[na:na]
      	at co.cask.cdap.etl.spark.batch.SparkBatchSinkFactory.serialize(SparkBatchSinkFactory.java:128) ~[na:na]
      	at co.cask.cdap.etl.spark.batch.ETLSpark.initialize(ETLSpark.java:153) ~[na:na]
      	at co.cask.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:141) ~[co.cask.cdap.cdap-api-3.5.0.jar:na]
      	at co.cask.cdap.api.spark.AbstractSpark.initialize(AbstractSpark.java:32) ~[co.cask.cdap.cdap-api-3.5.0.jar:na]
      	at co.cask.cdap.app.runtime.spark.SparkRuntimeService$4.call(SparkRuntimeService.java:356) ~[co.cask.cdap.cdap-spark-core-3.5.0.jar:na]
      	at co.cask.cdap.app.runtime.spark.SparkRuntimeService$4.call(SparkRuntimeService.java:349) ~[co.cask.cdap.cdap-spark-core-3.5.0.jar:na]
      	at co.cask.cdap.data2.transaction.Transactions.execute(Transactions.java:174) ~[co.cask.cdap.cdap-data-fabric-3.5.0.jar:na]
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vinisha Vinisha Shah
                Reporter:
                terence Terence Yim
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: