Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-12560

Logs are flooded with Insufficient data written/TooLongFrameException from TMS

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Incomplete
    • Affects Version/s: 4.3.0, 4.1.2
    • Fix Version/s: None
    • Component/s: Log, Messaging, Metrics
    • Labels:
    • Rank:
      1|i007n3:

      Description

      In one deployment, we see the logs flooded with:

      2017-08-24 04:06:12,177 - ERROR [MessagingMetricsCollectionService:s.n.w.p.h.HttpURLConnection$StreamingOutputStream@3501] - Failed in publishing metrics for timestamp 1503565572.
      java.io.IOException: insufficient data written
              at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3501)
              at co.cask.common.http.HttpRequests.execute(HttpRequests.java:111)
              at co.cask.cdap.common.internal.remote.RemoteClient.execute(RemoteClient.java:92)
              at co.cask.cdap.messaging.client.ClientMessagingService.performWriteRequest(ClientMessagingService.java:251)
              at co.cask.cdap.messaging.client.ClientMessagingService.publish(ClientMessagingService.java:182)
              at co.cask.cdap.metrics.collect.MessagingMetricsCollectionService$TopicPayload.publish(MessagingMetricsCollectionService.java:134)
              at co.cask.cdap.metrics.collect.MessagingMetricsCollectionService.publishMetric(MessagingMetricsCollectionService.java:102)
              at co.cask.cdap.metrics.collect.MessagingMetricsCollectionService.publish(MessagingMetricsCollectionService.java:97)
              at co.cask.cdap.metrics.collect.AggregatedMetricsCollectionService.publishMetrics(AggregatedMetricsCollectionService.java:133)
              at co.cask.cdap.metrics.collect.AggregatedMetricsCollectionService.run(AggregatedMetricsCollectionService.java:117)
              at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52)
              at java.lang.Thread.run(Thread.java:745)
      2017-08-24 04:06:12,280 - ERROR [messaging.service-worker-thread-42:o.j.n.h.c.h.HttpChunkAggregator@169] - Exception caught in channel processing.
      org.jboss.netty.handler.codec.frame.TooLongFrameException: HTTP content length exceeded 10485760 bytes.
              at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:169)
              at co.cask.http.RequestRouter.messageReceived(RequestRouter.java:79)
              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
              at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
              at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
              at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
              at org.jboss.netty.handler.codec.http.HttpContentEncoder.messageReceived(HttpContentEncoder.java:69)
              at co.cask.http.NettyHttpService$2.handleUpstream(NettyHttpService.java:205)
              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
              at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
              at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
              at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
              at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
              at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      Apparently, this is caused by something emitting a huge number of metrics in a single message. We can't tell what program or service is causing this due to CDAP-12558... what service could emit so many metrics that it exceeds 10MB?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                andreas Andreas Neumann
                Reporter:
                andreas Andreas Neumann
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: