Description of problem:
All default catalogsources in 4.11 are built using file-based catalogsouce. Those catalogsources fail to deploy successfully in 4.11 OCP cluster. Multiple CI runs on nightly build have failed due to this reason.
The main culprit is the longer process time for YAML/JSON unmarshalling in the registry pod. The proposal to address this issue to add startupProbe to the registry pod. The startupProbe will check for grpc health before activating the liveness/readiness probe.
Version-Release number of selected component (if applicable):
4.11
How reproducible:
Steps to Reproduce:
1. Delay an 4.11 OpenShift cluster
2. Check registry pods for default catalogsources such as redhat-operators
Actual results:
The pods fail due to liveness/readiness probe failure: openshift-marketplace pod/redhat-operators-h22ms node/ci-op-s04xckx3-de73b-7fxs4-master-1 - reason/Unhealthy Readiness probe failed: timeout: failed to connect service ":50051" within 1s
Expected results:
The registry pods for default catalogsources should be up and running.
Additional info:
See Slack thread for more information:
https://coreos.slack.com/archives/C01CQA76KMX/p1654190057669689
Note: This bug is for backporting process. The 4.10.z BZ is https://bugzilla.redhat.com/show_bug.cgi?id=2115874
- depends on
-
OCPBUGS-674 Default catalogs fails liveness/readiness probes
- Closed
- is cloned by
-
OCPBUGS-674 Default catalogs fails liveness/readiness probes
- Closed
- links to