The problem is caused by the fact that we don't always know whether files in the output location were created by the current (failed) mapreduce, or whether they existed before this job, and their existence failed the job.
The PFS's onFailure() is called in both situations, and it cannot decide whether the files were pre-existing. Therefore it cannot delete.
However, if the onSuccess() has been called, then it knows that the job was successful (so far) and hence the files in the output location were written by this job. It records an AddPartitionOperation and in case the job fails later (for whatever reason), rollbackTx will be called and will clean up the files.
So, this situation happens if the MapReduce job fails before the PFS's onSuccess is called. For example:
- multi-output, and another output's onSuccess() fails before this one is called
- single-output, but the job fails somehow in the middle (after writing some files, but before attempting to call onSuccess). This is rare, but for example, it happens if the job's transaction cannot be committed.