Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Blocker
Fix Version/s: None
Affects Version/s: None
Labels:
- trt-incident

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
None
Story Points:
None

Target Version:
None
Release Blocker:
None
Sprint:
None

Began sometime today or late yesterday, affects multiple clouds and is blocking payloads with most jobs failing.

Example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn/1678963706211340288

 [sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel] expand_less
Run #0: Failed expand_less 	1m0s
{  "service": "machine-config-daemon",
          "severity": "warning"
        },
        "value": [
          1689134505.447,
          "1"
        ]
      },
      {
        "metric": {
          "__name__": "ALERTS",
          "alertname": "KubeletHealthState",
          "alertstate": "firing",
          "container": "oauth-proxy",
          "endpoint": "metrics",
          "instance": "10.0.0.8:9001",
          "job": "machine-config-daemon",
          "namespace": "openshift-machine-config-operator",
          "node": "ci-op-fbh5bhvb-ed2ea-xdsvq-master-1",
          "pod": "machine-config-daemon-f5wmt",
          "prometheus": "openshift-monitoring/k8s",
          "service": "machine-config-daemon",
          "severity": "warning"
        },
        "value": [
          1689134505.447,
          "1"
        ]
      },

Being discussed here: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1689169944865979

At present we can find no changes in this payload that weren't in previous that did not exhibit the issue.
The new rhcos version seems fine in ci payloads.
Problem began surfacing for MCO in their presubmits a few days prior.

depends on

OCPBUGS-16128 4.13/4.14 MCDs do not work with FIPS enabled golang builders

Closed

links to

openshift/machine-config-operator#3795: 4.14: use rhel9 builder for daemon binary

openshift/machine-config-operator#3796: daemon: Run firstboot as a container image too

openshift/machine-config-operator#3799: OCPBUGS-16128: daemon: Copy matching binary to host, re-exec with it

Assignee:: Devan Goodwin

Reporter:: Devan Goodwin

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/07/12 2:49 PM

Updated:: 2023/07/19 7:31 AM

Resolved:: 2023/07/14 1:03 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates