Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-11723

CDAP should log a warning when full GC happens, and emit metrics about GC

    Details

    • Rank:
      1|i002pj:

      Description

      When people see issues with CDAP stability, it is often caused by memory issues, frequent GC or stop-the-world GC making one or more services unresponsive. Often these GC pauses are short enough to survive the ZK timeout, so no failover will happen, but long enough to cause other, seemingly random failures. This can happen in master, CDAP services, or even app containers.

      However, the GC pauses do not show up the master or application logs, and users don't necessarily check the GC logs which are in a separate file written directly by the JVM and thus bypassing log collection.

      It would be good if CDAP had a way to get notified by the JVM when GC happens (through JMX). Then we can log full GCs in system and application logs, and we can emit metrics about frequency and duration of GC events. This would greatly improve the problem diagnostics for users.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bhooshan Bhooshan Mogal
                Reporter:
                andreas Andreas Neumann
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: