Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-2144

Investigate traceroute/wireshark on mass disruption

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • Quality / Stability / Reliability
    • None
    • None
    • None
    • None
    • None
    • None

      To help debug whatever is going on with the bad days for reaching static endpoints in gcp and azure, Justin proposes we try to traceroute or collect tcpdump info as an artifact on those runs.

      This would need to be handled carefully, basically limit to the liveness probes, and only trigger once in a job run after a certain threshold of disruption is accumulated, while the problem is on-going.

      This could be really tricky and miss, because this is not solid blocks of disruption, it's lots of failed single requests. We could detect enough, and fire off the commands and still not see the problem. Maybe we'll get lucky though. Logging clearly in the artifact if we hit or not would be good though because then the artifact scanner may help us find it.

              kenzhang@redhat.com Ken Zhang
              rhn-engineering-dgoodwin Devan Goodwin
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: