sunil yadav <firstname.lastname@example.org>
2:08 AM (11 hours ago)
to CDAP Unsubscribe
I have two data set say A and B both Time Partitioned Parquet data set. I want periodically do following steps daily
1. Get all data from Set A for today and do de duplication on on a lkey and filter by latest time stamp.
2. Get all data set from B
3 Join 1 and 2 and again do De Dup on key and filter on time stamp.
4 . Replace Dataset content of B with this as Time Partitioned data.
Problem I am facing is when I read TimepartionedParquet Source I did not get any time in records for filtering, can you guys help me to solve the issue. I am thing to read them by Simple\File Source So that I can get Time. any other suggestion.