Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-18300

Metrics are exported multiple times with different descriptions

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 29.0.0.Final
    • Micrometer
    • None
    • Hide

      The reproducer can be found here: https://github.com/Spindl/WFLY-18300_reproducer

      How to run the reproducer:

      1. Build everything by running mvn clean install
      2. Start the otel-collector with: sh start-collector.sh
      3. Then you can run the server with: java -jar server/target/server-local-SNAPSHOT-bootable.jar
      4. Next you have to run the client once to load the metrics from the collector with: java client/src/main/java/com/nts/reproducer/client/Main.java
      5. In the log of the collector you can find the error messages: sh get-collector-logs.sh
      Show
      The reproducer can be found here: https://github.com/Spindl/WFLY-18300_reproducer How to run the reproducer: Build everything by running mvn clean install Start the otel-collector with: sh start-collector.sh Then you can run the server with: java -jar server/target/server-local-SNAPSHOT-bootable.jar Next you have to run the client once to load the metrics from the collector with: java client/src/main/java/com/nts/reproducer/client/Main.java In the log of the collector you can find the error messages: sh get-collector-logs.sh
    • ---
    • ---

      The micrometer subsystem does export the same metric with different descriptions in several cases, which is disallowed at least by the Prometheus text-based exposition spec.

      If the metrics are collected with the otel-collector (as outlined by e.g. this article) it complains about the diverging information and does not export the offending metrics, i.e. they are missing completely. For the undertow_request_count for example, only the metrics for the listeners are present, the ones for the servlets (which have the description "Number of all requests") are missing.

      In my case the following metrics are affected:

      • collected metric undertow_request_count {...} has help "Number of all requests" but should have "The number of requests this listener has served"
      • collected metric jgroups_num_messages_sent {...} has help "Number of messages sent" but should have ""
      • collected metric jgroups_num_suspect_events {...} has help "Number of suspect events emitted" but should have "Number of suspected events received"
      • collected metric jgroups_timeout {...} has help "Timeout after which a node is suspected if neither a heartbeat nor data have been received from it" but should have "Number of millis to wait for verification that a suspect is really dead (approximation)"
      • collected metric jgroups_xmit_table_num_resizes {...} has help "Number of retransmit table resizes" but should have "Number of resizes in all (receive and send) windows"
      • collected metric jgroups_min_threshold {...} has help "The threshold (as a percentage of max_credits) at which a receiver sends more credits to a sender. Example: if max_credits is 1'000'000, and min_threshold 0.25, then we send ca. 250'000 credits to P once we've got only 250'000 credits left for P (we've received 750'000 bytes from P)" but should have "The min threshold (percentage between 0 and 1.0) below which no message is dropped"
      • collected metric jgroups_num_messages_received {...} has help "" but should have "Number of messages received"
      • collected metric jgroups_xmit_table_num_purges {...} has help "Number of retransmit table purges" but should have "Number of purges in all (receive and send) windows"
      • collected metric jgroups_client_bind_port {...} has help "Start port for client socket. Default value of 0 picks a random port" but should have "The local port a client socket should bind to. If 0, an ephemeral port will be picked."
      • collected metric jgroups_port_range {...} has help "The range of valid ports: [bind_port .. bind_port+port_range ]. 0 only binds to bind_port and fails if taken" but should have "Number of additional ports to be probed for membership. A port_range of 0 does not probe additional ports. Example: initial_hosts=A[7800] port_range=0 probes A:7800, port_range=1 probes A:7800 and A:7801"
      • collected metric jgroups_xmit_table_missing_messages {...} has help "Total number of missing (= not received) messages in all retransmit buffers" but should have "Total number of missing messages in all receive windows"
      • collected metric undertow_request_count {...} has help "Number of all requests" but should have "The number of requests this listener has served"
      • collected metric jgroups_xmit_interval {...} has help "Interval (in milliseconds) at which messages in the send windows are resent" but should have "Interval (in milliseconds) at which missing messages (from all retransmit buffers) are retransmitted. 0 turns retransmission off"
      • collected metric jgroups_xmit_table_num_compactions {...} has help "Number of compactions in all (receive and send) windows" but should have "Number of retransmit table compactions"
      • collected metric jgroups_connect_timeout {...} has help "Max time (ms) to wait for a connect attempt" but should have "Max time (in millis) to wait for a connection to the Kubernetes server. If exceeded, an exception will be thrown"
      • collected metric jgroups_min_threshold {...} has help "The threshold (as a percentage of max_credits) at which a receiver sends more credits to a sender. Example: if max_credits is 1'000'000, and min_threshold 0.25, then we send ca. 250'000 credits to P once we've got only 250'000 credits left for P (we've received 750'000 bytes from P)" but should have "The min threshold (percentage between 0 and 1.0) below which no message is dropped"
      • collected metric jgroups_port_range {...} has help "Number of ports to probe for finding a free port" but should have "Number of additional ports to be probed for membership. A port_range of 0 does not probe additional ports. Example: initial_hosts=A[7800] port_range=0 probes A:7800, port_range=1 probes A:7800 and A:7801"
      • collected metric jgroups_xmit_table_num_moves {...} has help "Number of moves in all (receive and send) windows" but should have "Number of retransmit table moves"

            jaslee@redhat.com Jason Lee
            roland.spindelbalker@ntsretail.com Roland Spindelbalker-Davila
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: