Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-12852

parse-as-csv broken on large (multi-split) files

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 4.3.1
    • Fix Version/s: 5.1.0
    • Component/s: Data Prep
    • Labels:
    • Rank:
      1|i009dr:

      Description

      In wrangler, you can parse-as-csv and use the first row as the schema. This works great for small files.

      However, large files get split and every mapper except for the first does not have that header line. These other mappers now interpret the first row of data in their input as the schema, and they create fields named as the values found in the first row. These fields are, of course, different from the output schema, and as a consequence, all splits except for the first one produce only null values.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ashau Albert Shau
                Reporter:
                andreas Andreas Neumann
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: