Dataproc supports adding a set of init actions that will run on the cluster. They can be used to do additional setup, such as installing other services to run on the cluster. It would be good for the dataproc provisioner in CDAP to support configuring init actions on clusters it creates as well.
For example, the python plugin supports a 'native' mode that uses whatever python library exists on the cluster node instead of using interpreted python. In order to run it with the dataproc provisioner, the user would need to configure the provisioner to create dataproc clusters with python and whatever python libraries are needed.