Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-16066

Null columns get dropped by wrangler

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 6.3.0
    • Component/s: Data Prep
    • Rank:
      1|i00sa7:

      Description

      If a column only has null values, wrangler will drop that column from its schema.

      To reproduce:

      1. Create a workspace on the attached input file

      2. Use the following directives:
      parse-as-csv :body ',' false
      drop body
      set-column body_3 "null".equals(body_3) ? null : body_3

      3. Create a pipeline

      Note that the body_3 column gets removed from the output schema. Also note that the type for body_3 shows up as 'unknown' in the wrangler UI.

      This is likely because wrangler derives schema from data. This clearly is not a good strategy for nulls.

        Attachments

          Activity

            People

            • Assignee:
              mikkin Mikkin Patel
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: