Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-1651

Investigate vsphere in-cluster disruption

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None

      This disruption graph shows that the 95th percentile for in-cluster disruption on vsphere is quite spikey, there are days where we experience quite a bit, and days where we experience none.

      Goal of this bug is to get this graph flatlined, both the P95 and the daily avg.

      In the analysis below, we believe this appears to be CPU starvation, we see 3-10 leader elections for etcd in a serial run (would expect none), large numbers of etcd log messages complaining about slow operations, nodes apparently fully losing networking while ovs is complaining about unresonably long poll intervals (5s - 10s, a known indicator of cpu starvation).

              rhn-engineering-dgoodwin Devan Goodwin
              rhn-engineering-dgoodwin Devan Goodwin
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: