Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-14472

Service reaches ready state with failing updates to deployments while old replicas are still healthy

XMLWordPrintable

    • 8
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • openstack-operator-bundle-container-1.0.10-7
    • None
    • Hide
      .Failed service updates are being reflected accurately by the deployment status

      Before this update, when updates to service configurations failed, the failure was not being reflected in the condition status of the deployment. Instead, the `Ready` condition showed as "True" because the new pods created by the update were not being considered when checking the deployment readiness. With this update, any new pods created during a configuration update are now considered when assessing deployment readiness. If rolling out new pods fails, then the deployment reflects that it is stuck in `Deployment in progress`.
      Show
      .Failed service updates are being reflected accurately by the deployment status Before this update, when updates to service configurations failed, the failure was not being reflected in the condition status of the deployment. Instead, the `Ready` condition showed as "True" because the new pods created by the update were not being considered when checking the deployment readiness. With this update, any new pods created during a configuration update are now considered when assessing deployment readiness. If rolling out new pods fails, then the deployment reflects that it is stuck in `Deployment in progress`.
    • Bug Fix
    • Done
    • Important

      To Reproduce Steps to reproduce the behavior:

      1) deploy a ctlplane e.g. with 

      OPENSTACK_IMG=quay.io/openstack-k8s-operators/openstack-operator-index:87ab1f1fa16743cad640f994f459ef14c5d2b9ca

      2) update or add a configuration that the service pod fails to start,  e.g. 

      OPENSTACK_IMG=quay.io/openstack-k8s-operators/openstack-operator-index:a3ed3f47c7e695b766c0c9e86148fd262e464629

      3) the services reconcile, the new keystone pod will fail because of https://github.com/openstack-k8s-operators/keystone-operator/pull/541 , but the service is still ready because the old deployment pods are still healthy and can serve the service.

      Identified with keystone, but I think others are affected, too.

      Expected behavior

      • the keystone service should not reach ready and reflect that the deployment is not healthy to bring up the new pod

      Bug impact

      • service continues to run with the old deployment until the underlying problem is fixed.
      • if the user just validates for the condition state, it reflects not that the new service pod is up

      Additional context

      The behavior is:

      • when there is an update to the service, the deployment gets updated and a rolling restart happens, when the new pod fails to start it is in the CrashLoopBackoff, like this
        keystone-7fbb9c97b-4j7kk                                       1/1     Running            0               20m
        keystone-cbd787c54-8h9kj                                       0/1     CrashLoopBackOff   5 (70s ago)     4m12s
      • but the deployment status is ok,  because the minimum available replicas is still satisfied for the deployment
        keystone                                       1/1     1            1           20m
      • and keystoneapi is happy
        $ oc get keystoneapi
        NAME       NETWORKATTACHMENTS   STATUS   MESSAGE
        keystone                        True     Setup complete
      • as a result the ctlplane is happy, too
        $ oc get osctlplane
        NAME                                 STATUS   MESSAGE
        openstack-galera-network-isolation   True     Setup complete

              rhn-support-mschuppe Martin Schuppert
              rhn-support-mschuppe Martin Schuppert
              rhos-dfg-ospk8s
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: