Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-14261

Race condition in task cancelling in ProvisioningService


    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 5.0.0
    • Fix Version/s: 5.1.0
    • Component/s: Cloud Provisioner
    • Labels:
    • Release Notes:
      Fixed a bug that could cause the provisioning state of stopped program runs to be corrupted.
    • Rank:


      There is a race condition when a ProvisioningTask is being cancelled. One of the test that can fail from the race is ProvisioningServiceTest.testCancelDeprovision. The race condition can lead to status not ended in the CANCELLED state after task is being cancelled. Using the failed unit-test as an example, the race is described as follows

      1. The DeprovisionTask is submitted to the executor.
      2. [Thread-executor] The task is executing from the executor thread and it reaches line 79 of ProvisioningTask. That's the line before it update the status in dataset to REQUESTING_DELETE
      3. [Thread-cancel] From another thread, the ProvisioningService.cancelDeprovisionTask is being called.
      4. [Thread-cancel] The task's Future.cancel is called and it records the status as CANCELLED.
      5. [Thread-executor] The executor thread continue, hence rewriting the status to REQUESTING_DELETE.

      After that, the unit-test will fail, as the status will never change to CANCELLED again. I think the right fix is to have the dateset validate the state transition and the task executor would based on the result of that to decide what to do next.
      Alternatively, the cancelled task should record the cancelled state instead of from the cancelling thread.


          Issue Links



              • Assignee:
                ashau Albert Shau
                terence Terence Yim
              • Votes:
                0 Vote for this issue
                1 Start watching this issue


                • Created: