Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-12620

BZ#2323714 [TLS-E] Certmonger doesn't trigger service restarts after updating expiring certificates

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Not Set
    • Not Set
    • Not Set
    • None
    • Important

      Description of problem:
      Two RHOSP 16.2 customers reported same problem recently: at some point control plane services became unavailable and stopped responding to API calls. Problems in two mentioned deployments had slightly different symptoms, but same root cause behind it: certmonger didn't trigger proper service restarts after new certificates were issued.

      It looks like we don't trigger restarts for pacemaker-controlled services intentionally because we can't predict how pacemaker will react to local container restarts. While this makes sense from cluster stability perspective (imagine galera being blocked after containers are restarted locally by certmonger automation), customer still should restart them at some point (usually they have around 30 days to do this) and schedule relevant maintenance window. It probably makes sense to tune our documentation for RHOSP 16.2 and 17.1 here, but I want to ask engineering to double-check if documentation change is best approach here first and let me know if a bug should be reported.

      When it comes to other services (mostly API endpoints), then it looks like they are not restarted properly and simple USR1 signal sent by pkill to all httpd processes doesn't make it to containerized httpd processes or doesn't trigger proper restarts: without manual restart services continue to process requests using old cert bundle.

      Version-Release number of selected component (if applicable): reported for RHOSP 16.2, RHOSP 17.1 is likely affected as well

      How reproducible: don't restart control plane containers for 30 days after certmonger automatically renews certificates and see it dying

      Actual results: customer are unaware about requirement to restart some containers, automation doesn't handle certificate renewals properly

      Expected results: customers are able to properly handle expired certificates, automation handles most tasks

              rhn-gps-alee Ade Lee
              jira-bugzilla-migration RH Bugzilla Integration
              Joe Hakim Rahme Joe Hakim Rahme
              rhos-dfg-security
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: