During a request to read latest logs, the log handler will first try to read log events from Kafka directly. After first reading a batch of latest log events from Kafka, the handler will make a determination on whether to read the rest from disk. This determination is made based on the availability of logs on disk.
If the case where there is a delay in the availability of logs on disk, all subsequent log read requests go off Kafka. This can get exacerbated if log saver is down for an extended period of time, and there are multiple programs logging a lot of messages. In this case there could be a few million events in Kafka, and on every request log handler will read have to read them over and over again. This leads to handler threads hanging and results in poor experience.
We could try to limit the maximum number of log events fetched for any given log read API call to a small number like 10000 to prevent this situation. After reading 10,000 events from Kafka, the handler will read additional log messages from whatever is available on disk.