Consider a pipeline that has a single source, then has 10 branches from that source, with each branch containing a transform, then an aggregation, then a sink.
The current planner places a connector dataset in front of each aggregator. This means the first mapreduce job will read from the source, perform the transform on each branch, and write to 10 different temporary directories. There will then be 10 parallel mapreduce jobs that read from each temporary directory, perform an aggregation, then write to a sink.
However, almost all of the time, transforms will be much cheaper than I/O, so a more efficient plan would be to place a single connector dataset right after the source, before everything branches. The first mapreduce would then consist of reading from the source and writing to a single temporary directory. There would then be 10 parallel mapreduce jobs that all read from that temporary directory, perform a transform, then aggregation, then sink.