Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-12541

Improve StructuredRecordWritable to cache Schema objects

    Details

    • Release Notes:
      Improved memory usage of data pipeline with joiner in mapreduce execution engine.
    • Rank:
      1|i007jb:

      Description

      Improve StructuredRecordWritable to cache Schema objects. Current implementation creates Schema object for each input record. https://github.com/caskdata/cdap/blob/v4.2.0/cdap-app-templates/cdap-etl/cdap-etl-batch/src/main/java/co/cask/cdap/etl/batch/StructuredRecordWritable.java#L72

      This might cause OOM because for each record, we will create schema object which will have nested schema for each field. Attached are the profiler screenshots.

        Attachments

        1. heapdump1.png
          heapdump1.png
          413 kB
        2. heapdump2.png
          heapdump2.png
          457 kB
        3. pipeline.png
          pipeline.png
          144 kB

          Activity

            People

            • Assignee:
              vinisha Vinisha Shah
              Reporter:
              vinisha Vinisha Shah
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: