Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-6121

Dataset with UseDataset annotation in AbstractSpark doesn't have startTx() called on it

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.4.1
    • Fix Version/s: 3.5.0
    • Component/s: Datasets, Spark
    • Labels:
    • Release Notes:
      Fixed a bug in Spark that using @UseDataset causes NullPointException
    • Rank:
      1|hzzdwv:

      Description

      Have an AbstractSpark program use a PartitionedFileSet in its beforeSubmit method, via the @UseDataset annotation (as opposed to context.getDataset):

      @UseDataSet("abc")
      private PartitionedFileSet abcPFS;
      

      It's internal `tx` field will be null (because startTx was not called on this dataset).
      If you use sparkClientContext.getDataset("abc"), that will return the PartitionedFileSet with tx field being non null (so startTx is appropriately called), so the issue is only when using the @UseDataset annotation.

      I have attached a diff with which WikipediaPipelineAppTest will fail.

      A workaround is to use the getDataset method:

      // this pfs will have tx non null
      PartitionedFileSet pfs = context.getDataset(name);

        Attachments

          Activity

            People

            • Assignee:
              terence Terence Yim
              Reporter:
              ali.anwar Ali Anwar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: