Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1130

Investigate etcdGRPCRequestsSlow test

XMLWordPrintable

    • Critical
    • None
    • ETCD Sprint 226
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      The test results in sippy look really bad on our less common platforms, but still pretty unacceptable even on core clouds. It's reasonably often the only test that fails. We need to decide what to do here, and we're going to need input from the etcd team.

      As of Sep 13th:

      • several vsphere and openstack variant combo's fail this test around 24-32% of the time
      • aws, amd64, ovn, upgrade, upgrade-micro, ha - fails 6% of the time
      • aws, amd64, ovn, upgrade, upgrade-minor, ha - fails 4% of the time
      • gcp, amd64, sdn, upgrade, upgrade-minor, ha - fails 8% of the time
      • globally across all jobs fails around 3% of the time.

      Even on some major variant combos, a 4-8% failure rate is too high.
      On Sep 13 arch call (no etcd present), Damien mentioned this might be an upstream alert that just isn't well suited for OpenShift's use cases, is this the case and it needs tuning?

      Has the problem been getting worse?

      I believe this link https://datastudio.google.com/s/urkKwmmzvgo indicates that this may be the case for 4.12, AWS and Azure are both getting worse in ways that I don't see if we change the release to 4.11 where it looks consistent. gcp seems fine on 4.12. We do not have data for vsphere for some reason.

      This link shows the grpc_methods most commonly involved: https://search.ci.openshift.org/?search=etcdGRPCRequestsSlow+was+at+or+above&maxAge=48h&context=7&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

      At a glance: LeaseGrant, MemberList, Txn, Status, Range.

      Broken out of TRT-401
      For linking with sippy:
      [bz-etcd][invariant] alert/etcdGRPCRequestsSlow should not be at or above info
      [sig-arch][bz-etcd][Late] Alerts alert/etcdGRPCRequestsSlow should not be at or above info [Suite:openshift/conformance/parallel]

       

        1. image-2022-10-11-16-50-17-452.png
          86 kB
          Thomas Jungblut
        2. image-2022-10-13-16-13-53-247.png
          141 kB
          Thomas Jungblut
        3. image-2022-10-18-12-36-45-657.png
          107 kB
          Thomas Jungblut
        4. image-2022-10-18-18-00-58-785.png
          142 kB
          Thomas Jungblut
        5. image-2022-10-18-18-04-50-238.png
          112 kB
          Thomas Jungblut
        6. image-2022-10-18-18-14-21-383.png
          85 kB
          Thomas Jungblut
        7. image-2022-10-18-18-21-02-299.png
          79 kB
          Thomas Jungblut
        8. image-2022-10-20-11-44-58-462.png
          146 kB
          Thomas Jungblut
        9. image-2022-10-20-11-49-02-588.png
          81 kB
          Thomas Jungblut
        10. image-2022-10-20-11-57-26-687.png
          119 kB
          Thomas Jungblut
        11. image-2022-10-20-12-10-33-812.png
          369 kB
          Thomas Jungblut
        12. image-2022-10-20-12-31-03-538.png
          176 kB
          Thomas Jungblut
        13. screenshot-1.png
          110 kB
          Thomas Jungblut
        14. screenshot-2.png
          108 kB
          Thomas Jungblut

              rhn-engineering-dgoodwin Devan Goodwin
              rhn-engineering-dgoodwin Devan Goodwin
              Devan Goodwin Devan Goodwin
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated: