We have seen rogue containers on clusters that have the Yarn property `yarn.resourcemanager.recovery.enabled` set to true. This property allows the containers to continue running after Yarn has stopped or crashed. If Yarn doesn't detect the container on startup, the container will not be managed by Yarn.
This generally happens on long running containers and the easiest way to find them is:
- Stop all running applications
- Stop cdap
- Login to node managers and run `ps auxww | grep cdap`
- kill any containers listed