Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-15005

Omit/Add fields when getting schema


    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0.0
    • Component/s: Pipeline Plugins, UI
    • Labels:
    • Rank:


      In 6.0.0, schema propagation is handled in one place in the plugin – the configurePipeline() method. Getting output schema is exposed through the stage validation endpoint of the pipeline system service, which calls configurePipeline().

      In general, this removes a lot of code duplication within the plugins, since most of them implemented the 'Get Schema' button by taking their normal configuration plus an input schema, then returning the output schema. This is exactly what they had to do in the configurePipeline() method anyway.

      However, there are some special cases (most commonly with sources), where the Get Schema button needs to behave a bit differently. An example of this is the DBSource.

      The DBSource has 'importQuery' and 'schema' as config properties. When used during pipeline execution, the 'importQuery' must contain a '$CONDITIONS' literal that is used in conjunction with a bounding query to split up the work across different executors. But when the output schema needs to be fetched, we don't want the user to have to specify a bounding query and a $CONDITIONS clause, so we have a separate 'query' parameter that pops up and gets sent to the backend. This means the plugin needs an extra 'query' parameter just to get the output schema.

      The DBSource also allows users to set their own schema. This is primarily used in case the JDBC driver lies about the table schema, such as when it indicates a column is not nullable when in reality it is. When configurePipeline() is called to get the output schema, we actually any user provided schema to be omitted so that the plugin knows to get the schema using the jdbc driver. However, when configurePipeline() is called during pipeline deployment, it should trust that the user provided schema is correct. In this scenario, we want to omit 'schema' when making the 'Get Schema' call.

      These special cases indicate that the configurePipeline() method is overloaded at the moment and there should probably be some other abstraction for getting schema. However, until that exists, we can produce the right behavior if we have a way to specify in the Get Schema widget that additional properties are required, or certain properties should be omitted.

      For example, the "importQuery" widget for the DBSource could be changed to something like:

                "widget-type": "textarea",
                "label": "Import Query",
                "name": "importQuery",
                "widget-attributes": {
                  "rows": "4"
                "plugin-function": {
                  "widget": "getSchema",
                  "omitProperties": [
                      "name": "schema"
                  "addProperties": [
                      "name": "query",
                      "widget-type": "textarea",
                      "label": "Query",
                      "widget-attributes": {
                        "rows": "4"

      This would tell the UI to ask for an additional property called "query" and to omit the "schema" property when making the 'Get Schema' call.


          Issue Links



              • Assignee:
                edwin Edwin Elia
                ashau Albert Shau
              • Votes:
                0 Vote for this issue
                2 Start watching this issue


                • Created: