-
Bug
-
Resolution: Done
-
Normal
-
RHODS_1.23.0_GA
-
False
-
-
False
-
None
-
Testable
-
No
-
-
-
-
-
-
-
1.26.0
-
No
-
No
-
Pending
-
None
-
-
-
RHODS 1.26
Description of problem:
During the notebook scale test execution, we notice that many HTTP error 409 are reported by the Kube APIServer (green line):
In the logs of the APIServer log in the master nodes
we can find that many of them (373 out of 413) are caused by this event:
{
"kind": "Event",
"apiVersion": "audit.k8s.io/v1",
"level": "Metadata",
"auditID": "fe485de8-b4f4-4f4d-abb0-dced2a59b39e",
"stage": "ResponseComplete",
"requestURI": "/api/v1/namespaces/rhods-notebooks/configmaps",
"verb": "create",
"user": {
"username": "system:serviceaccount:redhat-ods-applications:odh-notebook-controller-manager",
"uid": "c2174df4-f0be-46d4-ae36-4e19d77f51d4",
"groups": [
"system:serviceaccounts",
"system:serviceaccounts:redhat-ods-applications",
"system:authenticated"
],
"extra": {
"authentication.kubernetes.io/pod-name": [
"odh-notebook-controller-manager-5f75589659-9jfcv"
],
"authentication.kubernetes.io/pod-uid": [
"ef723cfc-1cf7-4246-bef6-9de3506f1f49"
]
}
},
"sourceIPs": [
"10.0.168.13"
],
"userAgent": "manager/v0.0.0 (linux/amd64) kubernetes/$Format",
"objectRef": {
"resource": "configmaps",
"namespace": "rhods-notebooks",
"name": "trusted-ca",
"apiVersion": "v1"
},
"responseStatus": {
"metadata": {},
"status": "Failure",
"message": "configmaps \"trusted-ca\" already exists",
"reason": "AlreadyExists",
"details": {
"name": "trusted-ca",
"kind": "configmaps"
},
"code": 409
},
"requestReceivedTimestamp": "2023-03-09T13:23:24.287913Z",
"stageTimestamp": "2023-03-09T13:23:24.294380Z",
"annotations": {
"authorization.k8s.io/decision": "allow",
"authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"odh-notebook-controller-manager-rolebinding\" of ClusterRole \"odh-notebook-controller-manager-role\" to ServiceAccount \"odh-notebook-controller-manager/redhat-ods-applications\""
}
}
For better troubleshooting, this error should be avoid whenever possible.
Prerequisites (if any, like setup, operators/versions):
Steps to Reproduce
- create notebooks
- check the client error in prometheus:
'sum by (code) (increase(apiserver_request_total{code=~"4.."}[2m]))'
- check the kube-apiserver logs with:
oc debug node/<master nodes> chroot /host tail -f /var/log/kube-apiserver/audit.log | grep '"code":409' | grep trusted-ca
Actual results:
see many matching lines
Expected results:
see no matching line
Reproducibility (Always/Intermittent/Only Once):
always
Build Details:
Workaround:
Additional info:
This is likely caused by this unconditional Create call.
trustedCAConfigMap := &corev1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Name: "trusted-ca",
Namespace: notebook.Namespace,
Labels: map[string]string{"config.openshift.io/inject-trusted-cabundle": "true"},
},
}
err := r.Client.Create(ctx, trustedCAConfigMap)
if err != nil {
if apierrs.IsAlreadyExists(err) {
return nil
}
}
The CREATE could be turned into
- get the resource
- if doesn't exist, create it
- if exists and not identical update it
This last test (currently missing) is part of the controller duty.