Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-14306

Access to more capabilities to generate output schema

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Pipeline Plugins, Pipelines
    • Labels:
    • Rank:
      1|i00hjz:

      Description

      Today we have several plugins that support a 'Get Schema' button that UI users can use to get the output schema for a plugin. This is done through a plugin endpoint, where the plugin developer has access to the input schema, the plugin properties, and an EndpointPluginContext, which basically just allows loading other plugins.

      As far as I know, the only use of the EndpointPluginContext is to load a jdbc driver in the database source. Other capabilities are not exposed by the context for technical reasons. For example, datasets are unavailable and spark cannot be used.

      One example use case we have seen is that a user wants to load an ML pipeline (not to be confused with a CDAP pipeline, this is a Spark concept) and call PipelineModel.transformSchema() with the plugin's input schema in order to generate the output schema in a generic way for any ML pipeline.

      We have talked about running a system service like the one that exists for dataprep that will handle schema propagation and other pipeline specific logic. It seems like this would be a good place to try and support better schema support, as the plugin endpoints are pretty limited.

        Attachments

          Activity

            People

            • Assignee:
              priyanka Priyanka Nambiar
              Reporter:
              ashau Albert Shau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: