Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-4806

DynamicPartitioner does not work with AvroKeyOutputFormat and AvroKeyValueOutputFormat

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.3.0
    • Fix Version/s: 3.3.1
    • Component/s: Datasets
    • Labels:
    • Release Notes:
      Fixes PartitionedFileSet's DynamicPartitioner to work with Avro OutputFormats.
    • Rank:
      1|hzz613:

      Description

      Using DynamicPartitioner with a PartitionedFileSet (PFS) that is using AvroKeyOutputFormat or AvroKeyValueOutputFormat will place the output records in a file in the root directory of the PFS's base location, instead of file paths based upon any partitioning.

      In the implementation of DynamicPartitioner, the partition path is set to the OutputFormat by using the FileOutputFormat#setOutputName method, which sets it to the configuration with key 'mapreduce.output.basename.

      However, AvroOutputFormatBase does not respect that parameter when creating its output path, but instead, it uses its own key for this purpose - "avro.mo.config.namedOutput", and so the path based upon the partitioning is lost.
      For this same reason, there's also this issue in AVRO JIRA (baseOutputPath not respected/used) - https://issues.apache.org/jira/browse/AVRO-1215.
      The resolution there was to use the key "avro.mo.config.namedOutput", in the code that sets the configuration, instead of modifying the AvroOutputFormatBase class to use the same parameter as FileOutputFormat.

      The majority of other OutputFormats that extend FileOutputFormat respect and use the "mapreduce.output.basename" parameter, and so AvroOutputFormatBase is distinguishable in that sense.
      For instance, other subclasses of FileOutputFormat that utilize the parameter "mapreduce.output.basename":
      TextOutputFormat, ParquetOutputFormat, SequenceFileOutputFormat, MapFileOutputFormat, OrcNewOutputFormat, AvroSequenceFileOutputFormat.

        Attachments

          Activity

            People

            • Assignee:
              ali.anwar Ali Anwar
              Reporter:
              ali.anwar Ali Anwar
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: