- Create a pipeline that will write into GCS.
- Try to use Data Preparation to further transform the output.
Currently, there's no way to wrangle the output file that is written to GCS. Dataprep only able to process file with certain extensions.
Ideally, dataprep should be able to take a folder (HDFS folder?) and sample data within that folder.