-
Bug
-
Resolution: Done-Errata
-
Undefined
-
4.14
-
Moderate
-
No
-
Hypershift Sprint 249
-
1
-
False
-
This is a clone of issue OCPBUGS-23228. The following is the description of the original issue:
—
Description of problem:
Release controller > 4.14.2 > HyperShift conformance run > gathered assets:
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance/1722648207965556736/artifacts/e2e-aws-ovn-conformance/dump/artifacts/namespaces/clusters-e8d2a8003773eacb6a8b/core/pods/logs/kube-apiserver-5f47c7b667-42h2f-audit-logs.log | grep -v '/var/log/kube-apiserver/audit.log. has' | jq -r 'select(.user.username == "system:admin" and .verb == "create" and .requestURI == "/apis/operator.openshift.io/v1/storages") | .userAgent' | sort | uniq -c 65 hosted-cluster-config-operator-manager $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance/1722648207965556736/artifacts/e2e-aws-ovn-conformance/dump/artifacts/namespaces/clusters-e8d2a8003773eacb6a8b/core/pods/logs/kube-apiserver-5f47c7b667-42h2f-audit-logs.log | grep -v '/var/log/kube-apiserver/audit.log. has' | jq -r 'select(.user.username == "system:admin" and .verb == "create" and .requestURI == "/apis/operator.openshift.io/v1/storages") | .requestReceivedTimestamp + " " + (.responseStatus | (.code | tostring) + " " + .reason)' | head -n5 2023-11-09T17:17:15.130454Z 409 AlreadyExists 2023-11-09T17:17:15.163256Z 409 AlreadyExists 2023-11-09T17:17:15.198908Z 409 AlreadyExists 2023-11-09T17:17:15.230532Z 409 AlreadyExists 2023-11-09T17:17:22.899579Z 409 AlreadyExists
That's banging away pretty hard with creation attempts that keep getting 409ed, presumably because an earlier creation attempt succeeded. If the controller needs very quick latency in re-creation, perhaps an informing watch? If the controller can handle some re-creation latency, perhaps a quieter poll?
Version-Release number of selected component (if applicable):
4.14.2. I haven't checked other releases.
How reproducible:
Likely 100%. I saw similar behavior in an unrelated dump, and confirmed the busy 409s in the first CI run I checked.
Steps to Reproduce:
1. Dump a hosted cluster.
2. Inspect its audit logs for hosted-cluster-config-operator-manager create activity.
Actual results:
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance/1722648207965556736/artifacts/e2e-aws-ovn-conformance/dump/artifacts/namespaces/clusters-e8d2a8003773eacb6a8b/core/pods/logs/kube-apiserver-5f47c7b667-42h2f-audit-logs.log | grep -v '/var/log/kube-apiserver/audit.log. has' | jq -r 'select(.userAgent == "hosted-cluster-config-operator-manager" and .verb == "create") | .verb + " " + (.responseStatus.code | tostring)' | sort | uniq -c 130 create 409
Expected results:
Zero or rare 409 creation request from this user-agent.
Additional info:
The user agent seems to be defined here, so likely the fix will involve changes to that manager.
- clones
-
OCPBUGS-23228 hosted-cluster-config-operator-manager should throttle creation attempts
- Closed
- is blocked by
-
OCPBUGS-23228 hosted-cluster-config-operator-manager should throttle creation attempts
- Closed
- links to
-
RHSA-2023:7198 OpenShift Container Platform 4.15 security update