-
Spike
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
rhos-conplat-core-operators
-
-
Summary:
follow up for https://issues.redhat.com/browse/OSPRH-22726.
during the update the DB pods restart, also the keystone pods do a rolling restart. when running the sanity instance build script `workload_launch.sh sanity` while doing the operator update (not the afterwards service update), e.g. nova and glance has reported 503 as queries to keystone services did not work
HttpException: 503: Server Error for url: https://nova-public-openstack.apps.ocp.openstack.lab/v2.1/os-services?binary=nova-compute, The server is currently unavailable. Please try again at a later time.<br /><br The Keystone service is temporarily unavailable.
- while it is expected that keystone sees a DB error to the instance connected, there might be an issue with the keystone probe settings, or the used probe endpoint as it is just checking the /v3 endpoint url, which does not involve DB queries. With this a new started instance may already respond to be up, while it is still initializing
livenessProbe: failureThreshold: 3 httpGet: path: /v3 port: 5000 scheme: HTTPS initialDelaySeconds: 5 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 30 name: keystone-api readinessProbe: failureThreshold: 3 httpGet: path: /v3 port: 5000 scheme: HTTPS initialDelaySeconds: 5 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 30 - should test adding a `startupProbe` with an `initialDelaySeconds`, like something
startupProbe: failureThreshold: 6 httpGet: path: /v3 port: 5000 scheme: HTTPS initialDelaySeconds: 20 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 - is there an keystone URL which does not need authentication, but involves the DB which can be used in the startupProbe?
Goal:
- does it help to tune the probes for keystone to prevent service outage of keystone during the operator update
TimeBox:
- 5 days
Deliverables/Outcomes:
- recommendation if improving the probes is a valid solution
- is it also something we have to do for other operators
- of do we have to do some more complex operator update procedure and not update all operators at the same time?