Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-16843

Verbosity of Logging for pipelines


    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Parking Lot
    • Component/s: Log, Pipelines
    • Labels:
    • Rank:


      I've been working with CDAP via Data Fusion for 3 months now. The lack of internal logging or the ability to rerun a pipeline with a higher verbosity of logs makes troubleshooting very magical blackboxish. Either things work or they don't. Turning on Debug logs doesn't help as the primary plugins don't perform any log operations.

      I'm working almost exclusively with the Multi-table variants of plugins but some of the scenario's I've run up against.

      Column Adder node doesn't work on multi-table nodes. There's no way to view the incoming Schema/Outgoing schema of that node on a deployed job. I have to use Preview to troubleshoot everything visually. It does get added but ignored by the Sink due to how Multitable schema's are handled.

      Incompatible Schemas from JDBC source, doesn't log the table that this occurred on or Schema involved.

      Javascript Transform node, I was changing a value that was consistent across multiple tables but one table had an int type while the rest had string. It failed due to attempting to assign a string to an int variable. It didn't log any details around what the incoming/attempted assignments were, what the row contained before the transform node processed it, etc. The Error stacktrace showed it was a type issue but I had to completely solve that external to CDAP.

      Dataproc cluster not provisioned, job fails. Loading logs doesn't show any errors. Job screen shows that errors were logged. Log only has one item 'provisioning cluster'.

      The running job's UI is very often out of sink, jobs will have been failed for 30m+ and show as running unless you refresh the page.

      In general the debugging process for CDAP pipelines has been very frustrating. Especially since the Preview feature is broken and I have to execute Javascript to select specific runtime parameters as 'provided', like ${secure(mypw)}.

      In general I think there should be some sort of forced logging that occurs around failures, if a node fails it should log the incoming record's schema and contents.




            • Assignee:
              bhooshan Bhooshan Mogal
              JDS Jon
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: