Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1.0
    • Component/s: Pipelines
    • Labels:
    • Sprint:
      Sprint 1, App Eng Sprint 2, App Eng Sprint 3
    • Release Notes:
      Updated the Tokenizer plugin to be more forgiving when parsing tokens.
    • Rank:
      1|hzzpmn:

      Description

      Current implementation of the Tokenizer plugin requires the following improvements:

      • Change the name of the configuration from "pattern" to "pattern separator"
      • "Pattern separator" should also work for regex of white space example:
        • if user has given "Pattern separator": " "
          • then it should tokenize for each space that it encounters
        • if user has given "Pattern separator": "/s+"
          • it should treat all continuos spaces as a single separator.
      • Do not drop fields. The output schema should contain all the fields that were in the input schema and not only the column that is being tokenized

        Attachments

          Activity

            People

            • Assignee:
              ananya Ananya Bhattacharya
              Reporter:
              abhinav Abhinav Bansal
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: