-
Bug
-
Resolution: Unresolved
-
Critical
-
odf-4.16
-
None
-
False
-
-
False
-
Committed
-
Committed
-
If docs needed, set a value
-
-
-
None
Description of problem (please be detailed as possible and provide log
snippests):
Running test_pvc_expansion_when_full alerts for PersistentVolumeUsageNearFull and PersistentVolumeUsageCritical are not firing
prometheus-operator logs:
level=info ts=2024-09-25T15:35:38.553216149Z caller=operator.go:766 component=prometheus-controller key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2024-09-25T15:35:38.593611231Z caller=operator.go:572 component=alertmanager-controller key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2024-09-25T15:35:38.754635432Z caller=operator.go:766 component=prometheus-controller key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2024-09-25T15:35:39.352659418Z caller=operator.go:572 component=alertmanager-controller key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2024-09-25T15:35:39.352681394Z caller=operator.go:766 component=prometheus-controller key=openshift-monitoring/k8s msg="sync prometheus"
level=warn ts=2024-09-25T16:16:34.488279555Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/thanos/operator.go:326: watch of *v1.Namespace ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488352741Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.Alertmanager ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488341065Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/alertmanager/operator.go:409: watch of *v1.Namespace ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488480237Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.PodMonitor ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=error ts=2024-09-25T16:16:34.488520917Z caller=controller.go:189 component=kubelet_endpoints kubelet_object=kube-system/kubelet msg="Failed to synchronize nodes" err="listing nodes failed: Get \"https://172.30.0.1:443/api/v1/nodes\": http2: client connection lost"
level=warn ts=2024-09-25T16:16:34.488537442Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.PrometheusRule ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488548191Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.PrometheusRule ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488464049Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.Prometheus ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488536831Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/alertmanager/operator.go:411: watch of *v1.Namespace ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488520589Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.StatefulSet ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488579151Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1alpha1.AlertmanagerConfig ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488602736Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.ThanosRuler ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488603637Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.ServiceMonitor ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488629059Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.StatefulSet ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488645182Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.PartialObjectMetadata ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488642317Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.Probe ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488534986Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/thanos/operator.go:328: watch of *v1.Namespace ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488651016Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.StatefulSet ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488656742Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.PartialObjectMetadata ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488642006Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.PartialObjectMetadata ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488682305Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:118: watch of *v1.PartialObjectMetadata ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488636522Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/prometheus/server/operator.go:488: watch of *v1.Namespace ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=warn ts=2024-09-25T16:16:34.488804733Z caller=klog.go:118 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/prometheus/server/operator.go:486: watch of *v1.Namespace ended with: an error on the server (\"unable to decode an event from the watch stream: http2: client connection lost\") has prevented the request from succeeding"
level=info ts=2024-09-25T16:25:30.538200059Z caller=operator.go:572 component=alertmanager-controller key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2024-09-25T16:25:30.538273824Z caller=operator.go:766 component=prometheus-controller key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2024-09-25T16:25:34.432611416Z caller=operator.go:572 component=alertmanager-controller key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2024-09-25T16:25:34.432635855Z caller=operator.go:766 component=prometheus-controller key=openshift-monitoring/k8s msg="sync prometheus"
these are existing alerts fired now on cluster:
ClusterMonitoringOperatorDeprecatedConfig
AlertmanagerReceiversNotConfigured
PrometheusDuplicateTimestamps
PrometheusDuplicateTimestamps
PrometheusOutOfOrderTimestamps
PrometheusOutOfOrderTimestamps
PrometheusRemoteStorageFailures
PrometheusRemoteStorageFailures
PrometheusRuleFailures
PrometheusRuleFailures
PrometheusRuleFailures
PrometheusRuleFailures
PrometheusRuleFailures
PrometheusRuleFailures
Watchdog
Similarly to bug #2304076 warning msg's exist on prometheus-k8s pods, but it is not clear if 2304076 affects on this issue.
ts=2024-09-25T16:12:44.827Z caller=scrape.go:1735 level=warn component="scrape manager" scrape_pool=serviceMonitor/odf-storage/k8s-metrics-service-monitor/0 target="https://10.128.0.33:9091/federate?match%5B%5D=%7B_name%3D%27kube_node_status_condition%27%7D&match%5B%5D=%7Bname%3D%27kube_persistentvolume_info%27%7D&match%5B%5D=%7Bname%3D%27kube_storageclass_info%27%7D&match%5B%5D=%7Bname%3D%27kube_persistentvolumeclaim_info%27%7D&match%5B%5D=%7Bname%3D%27kube_deployment_spec_replicas%27%7D&match%5B%5D=%7Bname%3D%27kube_pod_status_phase%27%7D&match%5B%5D=%7Bname%3D%27kubelet_volume_stats_capacity_bytes%27%7D&match%5B%5D=%7Bname%3D%27kubelet_volume_stats_used_bytes%27%7D&match%5B%5D=%7Bname%3D%27node_disk_read_time_seconds_total%27%7D&match%5B%5D=%7Bname%3D%27node_disk_write_time_seconds_total%27%7D&match%5B%5D=%7Bname%3D%27node_disk_reads_completed_total%27%7D&match%5B%5D=%7Bname_%3D%27node_disk_writes_completed_total%27%7D" msg="Error on ingesting out-of-order samples" num_dropped=148
ts=2024-09-25T16:12:46.273Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/kube-state-metrics/0 target=https://10.128.0.21:8443/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=3
ts=2024-09-25T16:12:58.330Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/odf-storage/noobaa-mgmt-service-monitor/0 target=http://10.130.0.30:8080/metrics/web_server msg="Error on ingesting samples with different value but same timestamp" num_dropped=134
ts=2024-09-25T16:13:05.478Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/odf-storage/s3-service-monitor/0 target=http://10.130.0.32:7004/ msg="Error on ingesting samples with different value but same timestamp" num_dropped=148
Version of all relevant components (if applicable):
OC version:
Client Version: 4.16.11
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.16.11
Kubernetes Version: v1.29.7+d77deb8
OCS version:
ocs-operator.v4.16.2-rhodf OpenShift Container Storage 4.16.2-rhodf ocs-operator.v4.16.1-rhodf Succeeded
Cluster version:
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.16.11 True False 33h Error while reconciling 4.16.11: the cluster operator insights is not available
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
may impact
Is there any workaround available to the best of your knowledge?
no
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
Can this issue reproducible?
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1. Login to cluster
2. Create PVC, pod, attach and rin IO filling the PVC to 95% of capacity
3. Open management console, navigate to Observe / Alerts and capture existing alerts
Actual results:
no PersistentVolumeUsageNearFull and PersistentVolumeUsageCritical
Expected results:
PersistentVolumeUsageNearFull and PersistentVolumeUsageCritical for specific alert are being fired
Additional info:
cluster credentials will be granted to collect data necessary for bug
- is cloned by
-
DFBUGS-1475 [2314717] [ODF on ROSA HCP] [4.17] PVC utilization alerts are not firing
-
- New
-