Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-14562

Classloading issues when creating record writer with MultipleOutputs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1.1
    • Component/s: CDAP
    • Labels:
      None
    • Release Notes:
      Fixed a bug where XML libraries in the system classpath were not visible for output formats.
    • Rank:
      1|i00j1j:

      Description

      The format plugins need to bundle xerces in their jars to avoid an issue when CDAP creates a RecordWriter for the s3 filesystem:

      org.jets3t.service.ServiceException: Failed to initialize a SAX XMLReader
      	at org.jets3t.service.utils.ServiceUtils.loadXMLReader(ServiceUtils.java:783) ~[net.java.dev.jets3t.jets3t-0.9.0.jar:na]
      	at org.jets3t.service.impl.rest.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:97) ~[net.java.dev.jets3t.jets3t-0.9.0.jar:na]
      	at org.jets3t.service.impl.rest.httpclient.RestS3Service.getXmlResponseSaxParser(RestS3Service.java:180) ~[net.java.dev.jets3t.jets3t-0.9.0.jar:na]
      	at org.jets3t.service.impl.rest.httpclient.RestStorageService.listObjectsInternal(RestStorageService.java:1459) ~[net.java.dev.jets3t.jets3t-0.9.0.jar:na]
      	at org.jets3t.service.impl.rest.httpclient.RestStorageService.listObjectsChunkedImpl(RestStorageService.java:1422) ~[net.java.dev.jets3t.jets3t-0.9.0.jar:na]
      	at org.jets3t.service.StorageService.listObjectsChunked(StorageService.java:662) ~[net.java.dev.jets3t.jets3t-0.9.0.jar:na]
      	at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.list(Jets3tNativeFileSystemStore.java:268) ~[org.apache.hadoop.hadoop-aws-2.8.0.jar:na]
      	at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.list(Jets3tNativeFileSystemStore.java:249) ~[org.apache.hadoop.hadoop-aws-2.8.0.jar:na]
      	at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.list(Jets3tNativeFileSystemStore.java:242) ~[org.apache.hadoop.hadoop-aws-2.8.0.jar:na]
      	at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) ~[na:na]
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_161-google-v7]
      	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_161-google-v7]
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
      	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
      	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
      	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
      	at org.apache.hadoop.fs.s3native.$Proxy74.list(Unknown Source) ~[na:na]
      	at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:511) ~[org.apache.hadoop.hadoop-aws-2.8.0.jar:na]
      	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1436) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
      	at org.apache.hadoop.fs.s3native.NativeS3FileSystem.create(NativeS3FileSystem.java:420) ~[org.apache.hadoop.hadoop-aws-2.8.0.jar:na]
      	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:928) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
      	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
      	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
      	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:795) ~[org.apache.hadoop.hadoop-common-2.8.0.jar:na]
      	at org.apache.avro.mapreduce.AvroOutputFormatBase.getAvroFileOutputStream(AvroOutputFormatBase.java:91) ~[org.apache.hive.hive-exec-1.2.1.jar:1.2.1]
      	at org.apache.avro.mapreduce.AvroKeyOutputFormat.getRecordWriter(AvroKeyOutputFormat.java:105) ~[org.apache.hive.hive-exec-1.2.1.jar:1.2.1]
      	at co.cask.hydrator.format.output.DelegatingOutputFormat.getRecordWriter(DelegatingOutputFormat.java:44) ~[na:na]
      	at co.cask.cdap.internal.app.runtime.batch.dataset.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:205) ~[na:na]
      

      The reason this fails is that MultipleOutputs will set the context classloader to the output format's classloader before calling getRecordWriter(). This results in the xml parser not being found, because XMLReaderFactory uses the context classloader to instantiate the XML parser class. The xml parser class is available from the CDAP system classloader, since it's included in the CDAP classpath. However, it is not visible from the output format's classloader (the Plugin classloader in pipelines) unless the jar is bundled in.

      Plugins should not have to bundle this jar so that a system level dependency is available. Not sure what the cleanest way to fix this would be, but one possibility is to set the context classloader to a CombineClassLoader of the output format's classloader and the system classloader.

        Attachments

          Activity

            People

            • Assignee:
              ashau Albert Shau
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: