Data engineer sets up a data pipeline to read all the click logs from S3 and write to TPFS. He sets up a pipeline called AdJunkie_S3ToTPFS.
Another data scientists wants to look at click logs for data analysis, he wants to look at logs from AdJunkie project. Currently there is no way to know this information.
We need capabilities to automatically tag the datasets with the pipeline name and have capabilities to annotate additional metadata entities in hydrator