Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-1040

Common Dependencies with Hive cause issues

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Explore
    • Labels:
      None
    • Rank:
      1|hzyjv3:

      Description

      Common dependencies between our code and Hive cause problems if the versions of those dependencies are incompatible. To reproduce:

      1. create a cluster with a Hive version that uses guava11 (cdh5.1 for example).

      2. Revert the fix at https://github.com/caskdata/cdap/pull/837.

      3. install reverted cdap-master

      4. deploy the purchase example application

      5. run query "SELECT purchases FROM cdap_user_history"

      The query should fail, with the yarn logs showing something like:

      2014-12-15 23:05:52,887 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.IllegalAccessError: tried to access class com.google.common.hash.HashCodes from class co.cask.cdap.data2.datafabric.dataset.type.DistributedDatasetTypeClassLoaderFactory
      at co.cask.cdap.data2.datafabric.dataset.type.DistributedDatasetTypeClassLoaderFactory.create(DistributedDatasetTypeClassLoaderFactory.java:112)
      at co.cask.cdap.data2.datafabric.dataset.RemoteDatasetFramework.getDatasetType(RemoteDatasetFramework.java:274)
      at co.cask.cdap.data2.datafabric.dataset.RemoteDatasetFramework.getDataset(RemoteDatasetFramework.java:181)
      at co.cask.cdap.hive.datasets.DatasetAccessor.firstLoad(DatasetAccessor.java:207)
      at co.cask.cdap.hive.datasets.DatasetAccessor.instantiate(DatasetAccessor.java:186)
      at co.cask.cdap.hive.datasets.DatasetAccessor.instantiate(DatasetAccessor.java:157)
      at co.cask.cdap.hive.datasets.DatasetAccessor.getRecordScannable(DatasetAccessor.java:56)
      at co.cask.cdap.hive.datasets.DatasetInputFormat.getRecordReader(DatasetInputFormat.java:76)
      at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:237)
      at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:542)
      at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168)
      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
      

      If you look at the classpath in the yarn container for the MR job that Hive runs, it includes a job.jar as the first item in the classpath, and that jar includes guava11. This is because Hive will create the MR job conf by doing:

      job = new JobConf(conf, ExecDriver.class);
      

      in ExecDriver.initialize(). That way of creating the job conf will examine jars for the ExecDriver.class and use that job as job.jar. Because of this, we have no control over what guava version is used in the job, as it will always pick up what is included in hive-exec.jar, which is a fat jar.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ashau Albert Shau
                Reporter:
                ashau Albert Shau
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: