Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-2246

Placeholder: metal agent dualstack conformance jobs suffer massive in-cluster disruption

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None

      Needs careful consideration: https://grafana-loki.ci.openshift.org/d/ISnBj4LVk/disruption?var-platform=metal&var-percentile=P95&var-backend=pod-to-host-new-connections&var-releases=4.20&var-upgrade_type=none&var-networks=ovn&var-topologies=ha&var-architectures=amd64&var-lookback=7&var-master_nodes_updated=N&var-min_disruption_regression=-10&var-min_disruption_job_list=0&var-min_relevance=0&var-featureset=default&orgId=1

      Note the high disruption is originating from periodic-ci-openshift-release-master-nightly-4.20-e2e-agent-ha-dualstack-conformance

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-agent-ha-dualstack-conformance/1953969839244578816

      Possibly every run.

      Appears to be a total loss of egress from a specific node all at once, during e2e testing. A monitortest that fails if we see mass in-cluster disruption from one node would be ideal, I have a partial PR in flight but haven't gotten it working yet to post.

      Goal for this card it to formalize a bug for agent installer and potentially networking

              rh-ee-fbabcock Forrest Babcock
              rhn-engineering-dgoodwin Devan Goodwin
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: