Description of problem:
certified-operators part of the marketplace namespace is failing frequently due to startup probe 2h19m Normal Created pod/certified-operators-tkh2b Created container registry-server 2h19m Normal Started pod/certified-operators-tkh2b Started container registry-server 2h18m Warning Unhealthy pod/certified-operators-tkh2b Startup probe failed: timeout: failed to connect service ":50051" within 1s 2h18m Normal Killing pod/certified-operators-tkh2b Stopping container registry-server 2h18m Warning Unhealthy pod/certified-operators-tkh2b Readiness probe errored: rpc error: code = Unknown desc = command error: cannot register an exec PID: container is stopping, stdout: , stderr: , exit code -1 livenessProbe: exec: command: - grpc_health_probe - -addr=:50051 failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: registry-server ports: - containerPort: 50051 name: grpc protocol: TCP readinessProbe: exec: command: - grpc_health_probe - -addr=:50051 failureThreshold: 3 initialDelaySeconds: 5 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 --- Alert: Labels: alertname = KubePodCrashLooping container = registry-server endpoint = https-main job = kube-state-metrics managed_cluster = e74350bb-dc29-4716-860d-51269c4ef5d0 namespace = openshift-marketplace openshift_io_alert_source = platform pod = community-operators-j2g4v prometheus = openshift-monitoring/k8s reason = CrashLoopBackOffservice = kube-state-metrics severity = warninguid = 8d236469-bec3-4423-9662-d0bf7fc14826
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. 2. 3.
Actual results:
certified-operators failing leading alertname = KubePodCrashLooping.
Expected results:
certified-operators should not fail and not generate the alert.
Additional info:
The pods redhat-marketplace is also restarting due to probe failure.