Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-12875

Parquet file source cannot read certain types of parquet files

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0.0, 4.3.4
    • Component/s: None
    • Labels:
    • Release Notes:
      Fixed a bug that caused errors in the File source if it read parquet files that were not generated through Hadoop.
    • Rank:
      1|i009i7:

      Description

      When the File source is used to read a parquet file that doesn't contain 'avro.read.schema', 'parquet.avro.schema', or 'avro.schema' in its footer, the job will fail with:

      java.lang.Exception: java.lang.LinkageError: loader constraint violation: when resolving method "org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Lorg/codehaus/jackson/JsonNode;)V" the class loader (instance of co/cask/cdap/internal/app/runtime/plugin/PluginClassLoader) of the current class, org/apache/parquet/avro/AvroSchemaConverter, and the class loader (instance of co/cask/cdap/internal/app/runtime/ProgramClassLoader) for the method's defining class, org/apache/avro/Schema$Field, have different Class objects for the type org/codehaus/jackson/JsonNode used in the signature
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489) ~[org.apache.hadoop.hadoop-mapreduce-client-common-2.8.0.jar:na]
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549) ~[org.apache.hadoop.hadoop-mapreduce-client-common-2.8.0.jar:na]
      java.lang.LinkageError: loader constraint violation: when resolving method "org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Lorg/codehaus/jackson/JsonNode;)V" the class loader (instance of co/cask/cdap/internal/app/runtime/plugin/PluginClassLoader) of the current class, org/apache/parquet/avro/AvroSchemaConverter, and the class loader (instance of co/cask/cdap/internal/app/runtime/ProgramClassLoader) for the method's defining class, org/apache/avro/Schema$Field, have different Class objects for the type org/codehaus/jackson/JsonNode used in the signature
      	at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:222) ~[parquet-avro-1.8.1.jar:1.8.1]
      	at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:209) ~[parquet-avro-1.8.1.jar:1.8.1]
      	at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:124) ~[parquet-avro-1.8.1.jar:1.8.1]
      	at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:179) ~[parquet-hadoop-1.8.1.jar:1.8.1]
      	at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:201) ~[parquet-hadoop-1.8.1.jar:1.8.1]
      	at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:145) ~[parquet-hadoop-1.8.1.jar:1.8.1]
      	at co.cask.hydrator.plugin.batch.source.PathTrackingInputFormat$TrackingRecordReader.initialize(PathTrackingInputFormat.java:140) ~[1510704095912-0/:na]
      	at co.cask.hydrator.plugin.batch.source.PathTrackingInputFormat$TrackingParquetRecordReader.initialize(PathTrackingInputFormat.java:232) ~[1510704095912-0/:na]
      	at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReaderWrapper.initialize(CombineFileRecordReaderWrapper.java:69) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.8.0.jar:na]
      	at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initialize(CombineFileRecordReader.java:59) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.8.0.jar:na]
      	at co.cask.cdap.internal.app.runtime.batch.dataset.input.DelegatingRecordReader.initialize(DelegatingRecordReader.java:73) ~[na:na]
      	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.8.0.jar:na]
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.8.0.jar:na]
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[org.apache.hadoop.hadoop-mapreduce-client-core-2.8.0.jar:na]
      	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270) ~[org.apache.hadoop.hadoop-mapreduce-client-common-2.8.0.jar:na]
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_77]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_77]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_77]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_77]
      	at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
      

      I believe the root cause is that the app is exporting avro classes in order to implement error dataset. This can cause classloader errors like the above. We may be able to workaround this in the plugin until error datasets and removed and exposure of avro classes in the app is removed.

        Attachments

          Activity

            People

            • Assignee:
              ashau Albert Shau
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: