Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-9878

Teradata plugin is poorly named

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.4.0
    • Component/s: Pipeline Plugins, Pipelines
    • Labels:
      None
    • Release Notes:
      Renamed the Hydrator 'Teradata' batch source to 'Database'. The old 'Database' source is no longer supported.
    • Rank:
      1|hzz6l3:

      Description

      The "Teradata" plugin has nothing to do with teradata. It is simply delegating most operations to org.apache.hadoop.mapreduce.lib.db.DataDrivenDBInputFormat instead of org.apache.hadoop.mapreduce.lib.db.DBInputFormat.

      The difference between those two input formats is that the data driven one creates splits by splitting on ranges of a certain key (for example: select * from table where split_key > 0 and split_key < 100000), whereas DBInputFormat creates splits using limit offset queries (for example: select * from table limit 100000 offset 0).

      So really, the only difference between our "Database" batch source and "Teradata" batch source is that the first requires you to specify a count query to figure out the limit offsets, whereas the second requires you to specify a field to split on and a bounding query for that key to get the min and max values for the split key.

      These shouldn't be separate plugins. They should just be separate config settigns.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ashau Albert Shau
                Reporter:
                ashau Albert Shau
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: