Details

    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Metrics
    • Labels:

      Description

      We want to run couple tests to test metrics performance in isolation.

      (by client below we mean a program instance/container or smth that emits metrics)

      Metrics Collection & Processing

      Short burst test:

      Run these for 20min each:

      • find out the max # of emitted metrics that we can process without delay with fixed number of clients. E.g. use 1000 instances and increase number of emitted metrics...
      • find out the max # of clients that we can process without delay with fixed number of metrics emitted by each. E.g. use 1000 metrics and increase number of clients...
      • the above tests with 1, 2, 10 metrics processor instances

      Longer test (numbers may change based on the results of short burst):

      Run this for 24h+

      • many clients #, small metrics #, e.g. 1000 clients, 10 metrics each (+system)

      The way to measure perf numbers:

      • metric processing latency (delay) - i.e. on the metrics processor - compare current time with metric value emit ts (may be for every 100th record or so) and emit it as metric as well

      Metrics Search & Querying

      While running Longer test, measure query & search time 2h after start, 12h after start, 24h after start:

      • search for child context
        • for each: top-level, app, flow, flowlet, instance, ... basically drill down to end
        • same, but goes into dataset after namespace
        • search for fuzzy, e.g. all flows in CDAP (across all namespaces), all flowlets
      • search for metrics - on each of the levels as we did for search for child context
      • query:
        • for each 1s, 1min, 1h, aggregate=true resolutions do following:
        • within each context of the above (including fuzzy), do 60s, 60min, 24h, 30d, total time range for correspondent resolution (24h & 30d - for hour) for one metric

      The way to measure perf numbers:

      • response latency - measured ideally in CDAP backend and reported thru emitting metric

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                alexb Alex Baranau
                Reporter:
                alexb Alex Baranau
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved: