I've been working with CDAP via Data Fusion for 3 months now. The lack of internal logging or the ability to rerun a pipeline with a higher verbosity of logs makes troubleshooting very magical blackboxish. Either things work or they don't. Turning on Debug logs doesn't help as the primary plugins don't perform any log operations.
I'm working almost exclusively with the Multi-table variants of plugins but some of the scenario's I've run up against.
Column Adder node doesn't work on multi-table nodes. There's no way to view the incoming Schema/Outgoing schema of that node on a deployed job. I have to use Preview to troubleshoot everything visually. It does get added but ignored by the Sink due to how Multitable schema's are handled.
Incompatible Schemas from JDBC source, doesn't log the table that this occurred on or Schema involved.
Dataproc cluster not provisioned, job fails. Loading logs doesn't show any errors. Job screen shows that errors were logged. Log only has one item 'provisioning cluster'.
The running job's UI is very often out of sink, jobs will have been failed for 30m+ and show as running unless you refresh the page.
In general I think there should be some sort of forced logging that occurs around failures, if a node fails it should log the incoming record's schema and contents.