-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
rhos-16.2.z, rhos-17.1.z
-
None
Description of problem:
Two RHOSP 16.2 customers reported same problem recently: at some point control plane services became unavailable and stopped responding to API calls. Problems in two mentioned deployments had slightly different symptoms, but same root cause behind it: certmonger didn't trigger proper service restarts after new certificates were issued.
It looks like we don't trigger restarts for pacemaker-controlled services intentionally because we can't predict how pacemaker will react to local container restarts. While this makes sense from cluster stability perspective (imagine galera being blocked after containers are restarted locally by certmonger automation), customer still should restart them at some point (usually they have around 30 days to do this) and schedule relevant maintenance window. It probably makes sense to tune our documentation for RHOSP 16.2 and 17.1 here, but I want to ask engineering to double-check if documentation change is best approach here first and let me know if a bug should be reported.
When it comes to other services (mostly API endpoints), then it looks like they are not restarted properly and simple USR1 signal sent by pkill to all httpd processes doesn't make it to containerized httpd processes or doesn't trigger proper restarts: without manual restart services continue to process requests using old cert bundle.
Version-Release number of selected component (if applicable): reported for RHOSP 16.2, RHOSP 17.1 is likely affected as well
How reproducible: don't restart control plane containers for 30 days after certmonger automatically renews certificates and see it dying
Actual results: customer are unaware about requirement to restart some containers, automation doesn't handle certificate renewals properly
Expected results: customers are able to properly handle expired certificates, automation handles most tasks
- duplicates
-
OSPRH-12620 BZ#2323714 [TLS-E] Certmonger doesn't trigger service restarts after updating expiring certificates
- In Progress
- external trackers