Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-13002

Explore does not work properly for tables that store non-ASCII characters in String columns

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.3.2
    • Fix Version/s: 5.0.0, 4.3.3
    • Component/s: Datasets, Explore
    • Labels:
    • Release Notes:
      Fixed an issue with the retrieval of non-ASCII strings from Table datasets.
    • Rank:
      1|i00a7j:

      Description

      It appears that table datasets, when queried, decode byte[] by converting non-printable/non-ASCII characters into escape sequences. For example, inserting a value that contains the character nul (0x00), when querying, returns the the (4-character) string "\x00". The same happens for non-ASCII latin characters, or any other non-printable or non-ASCII character.

      On further investigation, it turns out that this is not only an issue in Explore, but generally with the Table API: Row.getString() uses Bytes.toStringBinary(byte[]) to convert the byte[] value to a String.Whereas a Put uses Bytes,toBytes(String) to encode a String into bytes. That is, this simple Test fails if val contains a non-printable char.

          t.put(new Put(key, col, val));
          Row row = t.get(new Get(key));
          Assert.assertEquals(val, row.getString(col));
      

      Because DatasetSerDe uses row.getString() to deserialize a String, this manifests itself in Explore, too.

        Attachments

          Activity

            People

            • Assignee:
              andreas Andreas Neumann
              Reporter:
              andreas Andreas Neumann
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: