Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-7557

DynamicPartitioner should have a way to close a writer when it is known to be done

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.6.0, 3.5.1
    • Fix Version/s: 4.0.0, 3.6.1, 3.5.2
    • Component/s: Datasets, MapReduce
    • Labels:
    • Release Notes:
      DynamicPartitioner can now limit the number of open RecordWriters to one, if the output partition keys are grouped.
    • Rank:
      1|hzzny7:

      Description

      DynamicPartitioner will open multiple record writers, one for each dynamic partition it is creating. Some output formats (for example, ORC) require a lot of memory when writing, and having multiple writers open can cause out-of-memory issues. To avoid that, there should be an option to keep at most one writer open at any given time. However, that will only work if the reducer (or mapper for a map-only job) emits all records for a partition consecutively. This cannot be generally assumed. Thus:

      • add an option to specify that only one writer should be open at once
      • close the current partition as soon as a key for a different partition is seen
      • if a partition is closed and we see another key for that partition, throw an error

        Attachments

          Activity

            People

            • Assignee:
              ali.anwar Ali Anwar
              Reporter:
              andreas Andreas Neumann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: