When I run multiple pipelines, each writing to the same directory, using the MapReduce engine, a subset may fail due to the others writing to the same output directory. Note that this pipeline writes to a subdirectory partitioned by minute, so it is possible for more than one of the set to succeed, if they run/complete at different minutes.
I can appropriately see the error message in the logs (see failedMR.txt for full pipeline logs):
Caused by: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory gs://test-new-df-folder/tmp/2019-10-21-14-16 already exists
However, when I do the same in Spark execution engine, I do not see such an error message, but the pipelines can still fail. See failedSpark[1-5].txt for logs for such pipeline runs.
I did encounter 1 run where the error message is appropriately logged. See failedSpark6.txt.
I have attached the pipeline as q-cdpa-data-pipeline.json.