In clusters that experience slowness and general proneness to timeouts, we see that a lot of invalid transactions are generated. Many of these are from system services, namely dataset service, dataset.executor, explore service, log.saver.
It is not clear how that happens. The transaction is invalidated because it is a short transaction and reaches its timeout. At commit time, it throws TransactionNotInProgressExn, which fails the tx. However, TransactionContext catches that and will rollback the changes, then abort the tx, which removes it from the invalid list.
Perhaps these services use a different way to execute transactions that does not attempt to rollback. This needs investigation.