Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-9284

Pipeline metrics fail if there are too many metrics

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2.0
    • Component/s: UI
    • Labels:
    • Sprint:
      UP4 Sprint 1, UP4 Sprint 2
    • Release Notes:
      UI fixed issue where pipeline metrics not showing up when there are a lot of nodes
    • Rank:
      1|hzy0vi:j

      Description

      I am trying to add 5 metrics per stage (process.time.total, process.time.max, process.time.min, process.time.avg, process.time.stddev), which resulted in the UI making the following call for metrics for the omniture solution from cask market:

      "http://127.0.0.1:11015/v3/metrics/query?tag=namespace:default&tag=app:omnitureHitsPipeline&tag=run:17c6411b-1a64-11e7-bfc7-de2aafa8893d&tag=workflow:DataPipelineWorkflow&metric=user.HitData.process.time.avg&metric=user.HitData.process.time.max&metric=user.HitData.process.time.min&metric=user.HitData.process.time.stddev&metric=user.HitData.process.time.total&metric=user.HitData.records.in&metric=user.HitData.records.out&metric=user.HitDataParser.process.time.avg&metric=user.HitDataParser.process.time.max&metric=user.HitDataParser.process.time.min&metric=user.HitDataParser.process.time.stddev&metric=user.HitDataParser.process.time.total&metric=user.HitDataParser.records.in&metric=user.HitDataParser.records.out&metric=user.HitDataParser.process.time.avg&metric=user.HitDataParser.process.time.max&metric=user.HitDataParser.process.time.min&metric=user.HitDataParser.process.time.stddev&metric=user.HitDataParser.process.time.total&metric=user.HitDataParser.records.in&metric=user.HitDataParser.records.out&metric=user.Group by Geo.aggregator.groups&metric=user.Group by Geo.connector.records.in&metric=user.Group by Geo.connector.records.out&metric=user.Group by Geo.process.time.avg&metric=user.Group by Geo.process.time.max&metric=user.Group by Geo.process.time.min&metric=user.Group by Geo.process.time.stddev&metric=user.Group by Geo.process.time.total&metric=user.Group by Geo.records.in&metric=user.Group by Geo.records.out&metric=user.HBase Geo Hits.process.time.avg&metric=user.HBase Geo Hits.process.time.max&metric=user.HBase Geo Hits.process.time.min&metric=user.HBase Geo Hits.process.time.stddev&metric=user.HBase Geo Hits.records.in&metric=user.HBase Geo Hits.records.out&metric=user.BrowserData.process.time.avg&metric=user.BrowserData.process.time.max&metric=user.BrowserData.process.time.min&metric=user.BrowserData.process.time.stddev&metric=user.BrowserData.process.time.total&metric=user.BrowserData.records.in&metric=user.BrowserData.records.out&metric=user.BrowserDataParser.process.time.avg&metric=user.BrowserDataParser.process.time.max&metric=user.BrowserDataParser.process.time.min&metric=user.BrowserDataParser.process.time.stddev&metric=user.BrowserDataParser.process.time.total&metric=user.BrowserDataParser.records.in&metric=user.BrowserDataParser.records.out&metric=user.BrowserDataParser.process.time.avg&metric=user.BrowserDataParser.process.time.max&metric=user.BrowserDataParser.process.time.min&metric=user.BrowserDataParser.process.time.stddev&metric=user.BrowserDataParser.process.time.total&metric=user.BrowserDataParser.records.in&metric=user.BrowserDataParser.records.out&metric=user.Joiner.connector.records.in&metric=user.Joiner.connector.records.out&metric=user.Joiner.process.time.avg&metric=user.Joiner.process.time.max&metric=user.Joiner.process.time.min&metric=user.Joiner.process.time.stddev&metric=user.Joiner.process.time.total&metric=user.Joiner.records.in&metric=user.Joiner.records.out&metric=user.TPFSAvro.process.time.avg&metric=user.TPFSAvro.process.time.max&metric=user.TPFSAvro.process.time.min&metric=user.TPFSAvro.process.time.stddev&metric=user.TPFSAvro.process.time.total&metric=user.TPFSAvro.records.in&metric=user.TPFSAvro.records.out&metric=user.Distinct Domains.aggregator.groups&metric=user.Distinct Domains.connector.records.in&metric=user.Distinct Domains.connector.records.out&metric=user.Distinct Domains.process.time.avg&metric=user.Distinct Domains.process.time.max&metric=user.Distinct Domains.process.time.min&metric=user.Distinct Domains.process.time.stddev&metric=user.Distinct Domains.process.time.total&metric=user.Distinct Domains.records.in&metric=user.Distinct Domains.records.out&metric=user.SnapshotText.process.time.avg&metric=user.SnapshotText.process.time.max&metric=user.SnapshotText.process.time.min&metric=user.SnapshotText.process.time.stddev&metric=user.SnapshotText.records.in&metric=user.SnapshotText.records.out&metric=user.Geo City Filter.process.time.avg&metric=user.Geo City Filter.process.time.max&metric=user.Geo City Filter.process.time.min&metric=user.Geo City Filter.process.time.stddev&metric=user.Geo City Filter.process.time.total&metric=user.Geo City Filter.records.error&metric=user.Geo City Filter.records.in&metric=user.Geo City Filter.records.out&metric=user.ErrorCollector.process.time.avg&metric=user.ErrorCollector.process.time.max&metric=user.ErrorCollector.process.time.min&metric=user.ErrorCollector.process.time.stddev&metric=user.ErrorCollector.process.time.total&metric=user.ErrorCollector.records.in&metric=user.ErrorCollector.records.out&metric=user.Empty Geo Cities.process.time.avg&metric=user.Empty Geo Cities.process.time.max&metric=user.Empty Geo Cities.process.time.min&metric=user.Empty Geo Cities.process.time.stddev&metric=user.Empty Geo Cities.process.time.total&metric=user.Empty Geo Cities.records.in&metric=user.Empty Geo Cities.records.out"
      

      this query fails, causing this in the backend:

      2017-04-05 18:21:12,323 - ERROR [router-server-worker-thread-4:c.c.c.g.r.h.HttpRequestHandler@239] - Exception raised in Request Handler [id: 0xec719b90, /127.0.0.1:55841 => /127.0.0.1:11015]
      org.jboss.netty.handler.codec.frame.TooLongFrameException: An HTTP line is larger than 4096 bytes.
      	at org.jboss.netty.handler.codec.http.HttpMessageDecoder.readLine(HttpMessageDecoder.java:642) ~[io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:182) ~[io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:101) ~[io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) ~[io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:485) ~[io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) ~[io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) ~[io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) ~[io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) ~[io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) ~[io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) ~[io.netty.netty-3.6.6.Final.jar:na]
      	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) ~[io.netty.netty-3.6.6.Final.jar:na]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77]
      	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
      

      That's because netty is trying to read the first line, gets up to:

      POST /v3/metrics/query?tag=namespace:default&tag=app:omnitureHitsPipeline&tag=run:6b94f831-1a6a-11e7-8189-de2aafa8893d&tag=workflow:DataPipelineWorkflow&metric=user.HitData.process.time.avg&metric=user.HitData.process.time.max&metric=user.HitData.process.time.min&metric=user.HitData.process.time.stddev&metric=user.HitData.process.time.total&metric=user.HitData.records.in&metric=user.HitData.records.out&metric=user.HitDataParser.process.time.avg&metric=user.HitDataParser.process.time.max&metric=user.HitDataParser.process.time.min&metric=user.HitDataParser.process.time.stddev&metric=user.HitDataParser.process.time.total&metric=user.HitDataParser.records.in&metric=user.HitDataParser.records.out&metric=user.HitDataParser.process.time.avg&metric=user.HitDataParser.process.time.max&metric=user.HitDataParser.process.time.min&metric=user.HitDataParser.process.time.stddev&metric=user.HitDataParser.process.time.total&metric=user.HitDataParser.records.in&metric=user.HitDataParser.records.out&metric=user.Group%20by%20Geo.aggregator.groups&metric=user.Group%20by%20Geo.connector.records.in&metric=user.Group%20by%20Geo.connector.records.out&metric=user.Group%20by%20Geo.process.time.avg&metric=user.Group%20by%20Geo.process.time.max&metric=user.Group%20by%20Geo.process.time.min&metric=user.Group%20by%20Geo.process.time.stddev&metric=user.Group%20by%20Geo.process.time.total&metric=user.Group%20by%20Geo.records.in&metric=user.Group%20by%20Geo.records.out&metric=user.HBase%20Geo%20Hits.process.time.avg&metric=user.HBase%20Geo%20Hits.process.time.max&metric=user.HBase%20Geo%20Hits.process.time.min&metric=user.HBase%20Geo%20Hits.process.time.stddev&metric=user.HBase%20Geo%20Hits.process.time.total&metric=user.HBase%20Geo%20Hits.records.in&metric=user.HBase%20Geo%20Hits.records.out&metric=user.BrowserData.process.time.avg&metric=user.BrowserData.process.time.max&metric=user.BrowserData.process.time.min&metric=user.BrowserData.process.time.stddev&metric=user.BrowserData.process.time.total&metric=user.BrowserData.records.in&metric=user.BrowserData.records.out&metric=user.BrowserDataParser.process.time.avg&metric=user.BrowserDataParser.process.time.max&metric=user.BrowserDataParser.process.time.min&metric=user.BrowserDataParser.process.time.stddev&metric=user.BrowserDataParser.process.time.total&metric=user.BrowserDataParser.records.in&metric=user.BrowserDataParser.records.out&metric=user.BrowserDataParser.process.time.avg&metric=user.BrowserDataParser.process.time.max&metric=user.BrowserDataParser.process.time.min&metric=user.BrowserDataParser.process.time.stddev&metric=user.BrowserDataParser.process.time.total&metric=user.BrowserDataParser.records.in&metric=user.BrowserDataParser.records.out&metric=user.Joiner.connector.records.in&metric=user.Joiner.connector.records.out&metric=user.Joiner.process.time.avg&metric=user.Joiner.process.time.max&metric=user.Joiner.process.time.min&metric=user.Joiner.process.time.stddev&metric=user.Joiner.process.time.total&metric=user.Joiner.records.in&metric=user.Joiner.records.out&metric=user.TPFSAvro.process.time.avg&metric=user.TPFSAvro.process.time.max&metric=user.TPFSAvro.process.time.min&metric=user.TPFSAvro.process.time.stddev&metric=user.TPFSAvro.process.time.total&metric=user.TPFSAvro.records.in&metric=user.TPFSAvro.records.out&metric=user.Distinct%20Domains.aggregator.groups&metric=user.Distinct%20Domains.connector.records.in&metric=user.Distinct%20Domains.connector.records.out&metric=user.Distinct%20Domains.process.time.avg&metric=user.Distinct%20Domains.process.time.max&metric=user.Distinct%20Domains.process.time.min&metric=user.Distinct%20Domains.process.time.stddev&metric=user.Distinct%20Domains.process.time.total&metric=user.Distinct%20Domains.records.in&metric=user.Distinct%20Domains.records.out&metric=user.SnapshotText.process.time.avg&metric=user.SnapshotText.process.time.max&metric=user.SnapshotText.process.time.min&metric=user.SnapshotText.process.time.stddev&metric=user.SnapshotText.records.in&metric=user.SnapshotText.records.out&metric=user.Geo%20City%20Filter.process.time.avg&metric=user.Geo
      

      then errors out because it is longer than 4096 characters.

      The metrics api looks really complicated. Apparently everything you can specify through query params you can also specify in the request body. But you can't mix using both query params and a request body. See http://docs.cask.co/cdap/current/en/reference-manual/http-restful-api/metrics.html#multiple-metrics-with-different-contexts for info on sending the query through the request body. It seems like the UI should be making calls this way, and not using query params.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                edwin Edwin Elia
                Reporter:
                ashau Albert Shau
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: