-
Bug
-
Resolution: Cannot Reproduce
-
Undefined
-
None
-
4.10
-
Moderate
-
None
-
False
-
-
Troubleshoot
-
Customer Facing
-
Red Hat OpenShift Cluster Manager (OCM)
Description of problem:
Prometheus Operator pod is restarting after 10 min in OCP 4.10.
From Prometheus Operator pod the following logs are seen.
message: se retry. Original error: stream error: stream ID 399; INTERNAL_ERROR; received from peer" level=warn ts=2022-11-29T14:48:31.363380184Z caller=operator.go:346 component=alertmanageroperator informer=Secret msg="cache sync not yet completed" level=warn ts=2022-11-29T14:49:31.36160769Z caller=operator.go:346 component=alertmanageroperator informer=Secret msg="cache sync not yet completed" level=warn ts=2022-11-29T14:49:44.792499686Z caller=klog.go:108 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:75: failed to list *v1.Secret: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 433; INTERNAL_ERROR; received from peer" level=error ts=2022-11-29T14:49:44.792634826Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:75: Failed to watch *v1.Secret: failed to list *v1.Secret: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 433; INTERNAL_ERROR; received from peer" level=warn ts=2022-11-29T14:50:31.362449753Z caller=operator.go:346 component=alertmanageroperator informer=Secret msg="cache sync not yet completed" level=error ts=2022-11-29T14:51:31.364389628Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="unable to sync caches for alertmanager" level=error ts=2022-11-29T14:51:31.364447429Z caller=operator.go:355 component=alertmanageroperator informer=Secret msg="failed to sync cache" level=warn ts=2022-11-29T14:51:31.364719365Z caller=main.go:407 msg="Server shutdown error" err="context canceled" level=warn ts=2022-11-29T14:51:31.364747962Z caller=operator.go:346 component=alertmanageroperator informer=Secret msg="cache sync not yet completed" level=warn ts=2022-11-29T14:51:31.368092701Z caller=main.go:412 msg="Unhandled error received. Exiting..." err="failed to sync cache for Secret informer" reason: Error
Checking Kube API server logs:
2022-11-28T10:29:23.881004817Z I1128 10:29:23.880781 18 node_authorizer.go:203] "NODE DENY" err="node 'sdeb-ocpin-p4001.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-main-generated" 2022-11-28T10:29:23.881933010Z I1128 10:29:23.881841 18 node_authorizer.go:203] "NODE DENY" err="node 'sdeb-ocpin-p4001.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-main-tls" 2022-11-28T10:29:23.882833556Z I1128 10:29:23.882741 18 node_authorizer.go:203] "NODE DENY" err="node 'sdeb-ocpin-p4001.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-main-proxy" 2022-11-28T10:29:23.883584958Z I1128 10:29:23.883508 18 node_authorizer.go:203] "NODE DENY" err="node 'sdeb-ocpin-p4001.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-kube-rbac-proxy-metric" 2022-11-28T10:29:23.884483964Z I1128 10:29:23.884374 18 node_authorizer.go:203] "NODE DENY" err="node 'sdeb-ocpin-p4001.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-main-dockercfg-5sqcv" 2022-11-28T10:29:23.886302739Z I1128 10:29:23.886217 18 node_authorizer.go:203] "NODE DENY" err="node 'sdeb-ocpin-p4001.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-main-tls-assets-0" 2022-11-28T10:29:23.887236843Z I1128 10:29:23.887055 18 node_authorizer.go:203] "NODE DENY" err="node 'sdeb-ocpin-p4001.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-kube-rbac-proxy" 2022-11-28T10:29:23.890385993Z I1128 10:29:23.890293 18 node_authorizer.go:203] "NODE DENY" err="node 'sdeb-ocpin-p4001.sys.schwarz' cannot get unknown configmap openshift-monitoring/alertmanager-trusted-ca-bundle-ev1qal76l341g" 2022-11-28T10:29:23.963100847Z I1128 10:29:23.962964 18 node_authorizer.go:203] "NODE DENY" err="node 'sdeb-ocpin-p4001.sys.schwarz' cannot get pvc openshift-monitoring/alertmanager-pvc-alertmanager-main-0, no relationship to this object was found in the node authorizer graph" 2022-11-28T10:29:24.002961502Z I1128 10:29:24.002749 18 node_authorizer.go:203] "NODE DENY" err="node 'se1-ocpin-p4000.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-main-tls-assets-0" 2022-11-28T10:29:24.003760960Z I1128 10:29:24.003683 18 node_authorizer.go:203] "NODE DENY" err="node 'se1-ocpin-p4000.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-main-dockercfg-5sqcv" 2022-11-28T10:29:24.004613539Z I1128 10:29:24.004505 18 node_authorizer.go:203] "NODE DENY" err="node 'se1-ocpin-p4000.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-main-generated" 2022-11-28T10:29:24.005383738Z I1128 10:29:24.005298 18 node_authorizer.go:203] "NODE DENY" err="node 'se1-ocpin-p4000.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-main-proxy" 2022-11-28T10:29:24.006671552Z I1128 10:29:24.005595 18 node_authorizer.go:203] "NODE DENY" err="node 'se1-ocpin-p4000.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-main-tls" 2022-11-28T10:29:24.006671552Z I1128 10:29:24.006115 18 node_authorizer.go:203] "NODE DENY" err="node 'se1-ocpin-p4000.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-kube-rbac-proxy" 2022-11-28T10:29:24.006671552Z I1128 10:29:24.006539 18 node_authorizer.go:203] "NODE DENY" err="node 'se1-ocpin-p4000.sys.schwarz' cannot get unknown secret openshift-monitoring/alertmanager-kube-rbac-proxy-metric" 2022-11-28T10:29:24.007216831Z I1128 10:29:24.007126 18 node_authorizer.go:203] "NODE DENY" err="node 'se1-ocpin-p4000.sys.schwarz' cannot get unknown configmap openshift-monitoring/alertmanager-trusted-ca-bundle-ev1qal76l341g" 2022-11-28T10:29:24.007919813Z I1128 10:29:24.007840 18 node_authorizer.go:203] "NODE DENY" err="node 'se1-ocpin-p4000.sys.schwarz' cannot get pvc openshift-monitoring/alertmanager-pvc-alertmanager-main-1, no relationship to this object was found in the node authorizer graph" 2022-11-28T10:29:24.112280598Z I1128 10:29:24.112137 18 node_authorizer.go:203] "NODE DENY" err="node 'se1-ocpin-p4000.sys.schwarz' cannot get pvc openshift-monitoring/alertmanager-pvc-alertmanager-main-1, no relationship to this object was found in the node authorizer graph" 2022-11-28T10:29:28.803492855Z I1128 10:29:28.802757 18 trace.go:205] Trace[553366156]: "Update" url:/apis/rbac.authorization.k8s.io/v1/clusterroles/alertmanager-main,user-agent:Go-http-client/2.0,audit-id:e730e96f-c144-4385-9e9b-52f048f88f3b,client:4.160.57.16,accept:application/json, */*,protocol:HTTP/2.0 (28-Nov-2022 10:29:28.099) (total time: 703ms): 2022-11-28T10:30:05.565948456Z I1128 10:30:05.565809 18 trace.go:205] Trace[1031716344]: "Get" url:/api/v1/namespaces/openshift-monitoring/pods/alertmanager-main-1/log,user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36,audit-id:c4415728-79ca-4e05-9fc4-6a39e4fefee5,client:4.160.59.201,accept:,protocol:HTTP/1.1 (28-Nov-2022 10:30:02.580) (total time: 2984ms):
An important note about the cluster: huge cluster (348 nodes) and the number of secrets is really high (11017)
Version-Release number of selected component (if applicable):
How reproducible:
NA
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info: