Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-17024

AutoJoin should be case sensitive when selecting fields

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.2.1, 6.3.0, 6.1.3
    • Component/s: Pipelines
    • Labels:
      None
    • Rank:
      1|i00ymn:

      Description

      If one of the stages going into a join has two columns that are the same but with different casing, the pipeline will fail with an error of the form:

      Caused by: org.apache.spark.sql.AnalysisException: Reference 'region' is ambiguous, could be: region#14, region#15.;
      	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:264) ~[spark-catalyst_2.11-2.1.3.jar:2.1.3]
      	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168) ~[spark-catalyst_2.11-2.1.3.jar:2.1.3]
      

      This particular example happened when I had a stage that had 'region' and 'REGION' as field names. This is a backwards incompatible change from before, where they were treated as different fields without a problem.

      I think we just need to set 'spark.sql.caseSensitive' to true.

        Attachments

          Activity

            People

            • Assignee:
              ashau Albert Shau
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: