Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-13733

Improve the way we implement Store and Dataset, and use transaction correctly


    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Rank:


      Now we use Store to start transaction for the underlying dataset in each method and for some store(e,g, MetadataStore), we put additional logic in each method. In this way, we just inject the store and call the method directly.

      This actually is not correct in many cases and provide some bugs. For example, when we delete an app, the deletion actually happens in multiple dataset(AppMetaStore to delete app, MetadataStore to delete metadata, etc). All these operations actually should happen in one transaction but since we use store and call the method, we start multiple transactions to delete an app. If any of these operation fails, we will see some data get deleted while some other data left. 

      Another example is CDAP-13552, when we update a schedule, we call delete on the existing schedule and add the new one in two transactions. If the add operation fails, the existing schedule is also deleted. The deletion and addition should happen in same transaction to make the update an atomic operation.

      Same things should happen at namespace deletion. But since deletion of datasets will need to send REST call to dataset service. It is more complicated than the previous examples.

      And now we have more and more use cases which needs to start transaction using multiple datasets. We will need to avoid using Store to prevent nested transactions to happen. But some store is not just a wrapper of the underlying dataset and has additional logic outside of a transaction, which makes it very difficult to directly use the underlying table. For example, the MetadataStore has logic to publish audit after each transaction is finished, if we want to use its underlying table(MetadataDataset) directly, it will be difficult to have these audit publish logic.


      Therefore, we need to have a clear definition of Store and Dataset from now. A store should just be a wrapper which starts a transaction for the method of underlying dataset and it should only be used when there is no other dataset involved in the operation. In fact, in most cases, we can just get rid of using Store since we can just create the Transactional in the constructor, get the underlying dataset using DatasetContext. This way we can manage transaction more easily and make sure one transaction happens for one atomic operation.




            • Assignee:
              bhooshan Bhooshan Mogal
              yaojie Yaojie Feng
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: