Error datasets aren't very usable in Hydrator. They were added before many of the other Hydrator features were added, and plain don't work in Spark pipelines and in certain MapReduce pipelines (depending on the calculated plan). We need to redesign how they are handled.
Based on some initial discussions, an initial thought is to treat errors more similarly to normal records. Error records get passed along with normal output, and users can configure a stage to receive all records, filter errors, or only receive errors. Users could then make error handling a feature of the pipeline instead of writing to some file or table that nobody looks at anyway.
Currently, plugins emit normal output and errors through the Emitter interface:
This could stay the same. One improvement we may want to consider is to catch exceptions thrown by the plugin and automatically call emitError() on the record that caused the exception. Plugins could be changed so that all the methods that get a record as input get a Record object that contains the actual object to operate on, as well as whether it was an error or not. For example:
The abstract classes plugins extend can implement this new method by calling the existing method, which means existing plugins don't need to change.