-
Bug
-
Resolution: Done
-
Normal
-
RHODS_1.23.0_GA
-
False
-
None
-
False
-
Testable
-
No
-
-
-
-
-
-
-
1.26.0
-
No
-
No
-
Pending
-
None
-
-
-
RHODS 1.26
Description of problem:
During the notebook scale test execution, we notice that many HTTP error 409 are reported by the Kube APIServer (green line):
In the logs of the APIServer log in the master nodes
we can find that many of them (373 out of 413) are caused by this event:
{ "kind": "Event", "apiVersion": "audit.k8s.io/v1", "level": "Metadata", "auditID": "fe485de8-b4f4-4f4d-abb0-dced2a59b39e", "stage": "ResponseComplete", "requestURI": "/api/v1/namespaces/rhods-notebooks/configmaps", "verb": "create", "user": { "username": "system:serviceaccount:redhat-ods-applications:odh-notebook-controller-manager", "uid": "c2174df4-f0be-46d4-ae36-4e19d77f51d4", "groups": [ "system:serviceaccounts", "system:serviceaccounts:redhat-ods-applications", "system:authenticated" ], "extra": { "authentication.kubernetes.io/pod-name": [ "odh-notebook-controller-manager-5f75589659-9jfcv" ], "authentication.kubernetes.io/pod-uid": [ "ef723cfc-1cf7-4246-bef6-9de3506f1f49" ] } }, "sourceIPs": [ "10.0.168.13" ], "userAgent": "manager/v0.0.0 (linux/amd64) kubernetes/$Format", "objectRef": { "resource": "configmaps", "namespace": "rhods-notebooks", "name": "trusted-ca", "apiVersion": "v1" }, "responseStatus": { "metadata": {}, "status": "Failure", "message": "configmaps \"trusted-ca\" already exists", "reason": "AlreadyExists", "details": { "name": "trusted-ca", "kind": "configmaps" }, "code": 409 }, "requestReceivedTimestamp": "2023-03-09T13:23:24.287913Z", "stageTimestamp": "2023-03-09T13:23:24.294380Z", "annotations": { "authorization.k8s.io/decision": "allow", "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"odh-notebook-controller-manager-rolebinding\" of ClusterRole \"odh-notebook-controller-manager-role\" to ServiceAccount \"odh-notebook-controller-manager/redhat-ods-applications\"" } }
For better troubleshooting, this error should be avoid whenever possible.
Prerequisites (if any, like setup, operators/versions):
Steps to Reproduce
- create notebooks
- check the client error in prometheus:
'sum by (code) (increase(apiserver_request_total{code=~"4.."}[2m]))'
- check the kube-apiserver logs with:
oc debug node/<master nodes> chroot /host tail -f /var/log/kube-apiserver/audit.log | grep '"code":409' | grep trusted-ca
Actual results:
see many matching lines
Expected results:
see no matching line
Reproducibility (Always/Intermittent/Only Once):
always
Build Details:
Workaround:
Additional info:
This is likely caused by this unconditional Create call.
trustedCAConfigMap := &corev1.ConfigMap{ ObjectMeta: metav1.ObjectMeta{ Name: "trusted-ca", Namespace: notebook.Namespace, Labels: map[string]string{"config.openshift.io/inject-trusted-cabundle": "true"}, }, } err := r.Client.Create(ctx, trustedCAConfigMap) if err != nil { if apierrs.IsAlreadyExists(err) { return nil } }
The CREATE could be turned into
- get the resource
- if doesn't exist, create it
- if exists and not identical update it
This last test (currently missing) is part of the controller duty.