Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-3030

Loading of custom datasets broken after upgrade

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.1.0
    • Component/s: Datasets
    • Labels:
      None
    • Rank:
      1|hzyw53:

      Description

      Scenario is this: An application includes a custom dataset that embeds a system dataset (for example, TPFS), and that system dataset changed between 3.0 and 3.1. For example, pre-3.1, system dataset PFS embeds a Table and a FileSet, but from 3.1 on, it indexes the partition table, so now it depends in addition on indexedTable.

      When the app is deployed (in 3.0.1), the dataset framework creates a dataset module for the custom dataset, and deploys it. That module records the dependencies. When the dataset is instantiated, the dataset framework creates a (temporary) dataset registry that contains only the dependencies recorded for the type to be loaded (in the example, PFS, Table and FileSet).
      After the upgrade to 3.1, the custom dataset (through PFS) depends on indexedTable, which is not recorded in its module's dependencies, and the loading fails with:

      2015-07-16 18:28:36,075 - ERROR [executor-23:c.c.c.d.d.d.RemoteDatasetFramework@369] - Was not able to load dataset module class co.cask.cdap.data2.dataset2.lib.partitioned.TimePartitionedFileSetModule while trying to load type DatasetTypeMeta{name=co.cask.cdap.test.TestApp$MyDataset, modules=DatasetModuleMeta{name=fileSet, className=co.cask.cdap.data2.dataset2.lib.file.FileSetModule, jarLocation=null, usesModules=, usedByModules=timePartitionedFileSet,partitionedFileSet},DatasetModuleMeta{name=orderedTable-hbase, className=co.cask.cdap.data2.dataset2.module.lib.hbase.HBaseTableModule, jarLocation=null, usesModules=, usedByModules=core,objectMappedTable,cube,usage,queueDataset},DatasetModuleMeta{name=timePartitionedFileSet, className=co.cask.cdap.data2.dataset2.lib.partitioned.TimePartitionedFileSetModule, jarLocation=null, usesModules=fileSet,orderedTable-hbase,core, usedByModules=},DatasetModuleMeta{name=co.cask.cdap.test.TestApp$MyDataset, className=co.cask.cdap.test.TestApp$MyDataset, jarLocation=hdfs://unew2015-1000.dev.continuuity.net/cdap/namespaces/dummy/datasets/co.cask.cdap.test.TestApp$MyDataset/archive/co.cask.cdap.test.TestApp$MyDataset.jar, usesModules=fileSet,orderedTable-hbase,timePartitionedFileSet, usedByModules=}}
      java.lang.IllegalArgumentException: Requested dataset type does NOT exist: indexedTable
      at co.cask.cdap.data2.dataset2.InMemoryDatasetDefinitionRegistry.get(InMemoryDatasetDefinitionRegistry.java:41) ~[co.cask.cdap.cdap-data-fabric-3.1.0-SNAPSHOT.jar:na]
      at co.cask.cdap.data2.dataset2.lib.partitioned.TimePartitionedFileSetModule.register(TimePartitionedFileSetModule.java:37) ~[co.cask.cdap.cdap-data-fabric-3.1.0-SNAPSHOT.jar:na]
      at co.cask.cdap.data2.datafabric.dataset.RemoteDatasetFramework.getDatasetType(RemoteDatasetFramework.java:367) [co.cask.cdap.cdap-data-fabric-3.1.0-SNAPSHOT.jar:na]
      at co.cask.cdap.data2.datafabric.dataset.RemoteDatasetFramework.getDataset(RemoteDatasetFramework.java:231) [co.cask.cdap.cdap-data-fabric-3.1.0-SNAPSHOT.jar:na]
      at co.cask.cdap.data.dataset.SystemDatasetInstantiator.getDataset(SystemDatasetInstantiator.java:79) [co.cask.cdap.cdap-data-fabric-3.1.0-SNAPSHOT.jar:na]
      ...
      

      This will happen for any custom dataset that embeds a PFS or TPFS (or possibly other system dataset types that were changed in 3.1).

      Redeploying the app will not help because that will not redeploy the dataset (unless forced dataset upgrades are configured at the CDAP level, which is not by default and not even documented).

      A possible fix is to always include all system datasets in the temporary registry used for loading a custom a dataset. The risk of that should be low, because the including the system datasets cannot conflict with any user datasets.

        Attachments

          Activity

            People

            • Assignee:
              ali.anwar Ali Anwar
              Reporter:
              andreas Andreas Neumann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: