Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-13016

Time partitioned dataset source should return the partition time along with records as an option

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Pipeline Plugins
    • Labels:
    • Rank:
      1|i00aa7:

      Description

      Use case from CDAP user group: https://groups.google.com/forum/#!topic/cdap-user/pmNKLkksAGU

      sunil yadav <sunilyadavsky@gmail.com>
      2:08 AM (11 hours ago)
      
      to CDAP Unsubscribe
      Hi All,
      
      I have two data set  say A and B both Time Partitioned Parquet data set. I want periodically do following steps daily
       1. Get all data from Set A for today and do de duplication on on a lkey and filter by latest time stamp.
       2. Get all data set from B
       3  Join 1 and 2 and again do De Dup on key and filter on time stamp.
      4 . Replace Dataset content of B with this as Time Partitioned data.
      
      Problem I am facing is when I read TimepartionedParquet Source I did not get any time in records for filtering, can you guys help me to solve the issue. I am thing to read them by Simple\File Source So that I can get Time. any other suggestion.
      
      Regards
      

        Attachments

          Activity

            People

            • Assignee:
              bhooshan Bhooshan Mogal
              Reporter:
              sree Sreevatsan Raman
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: