Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9237

Downgrading a cluster from 4.11 to 4.10 is failed

XMLWordPrintable

    • Quality / Stability / Reliability
    • None
    • None
    • 1
    • Moderate
    • None
    • Unspecified
    • None
    • None
    • Rejected
    • Sprint 235
    • 1
    • None
    • If docs needed, set a value
    • None
    • None
    • None
    • None
    • None

      Description of problem: In 4.11, configure timeout of liveness probe and readiness probe for the router deploy in openshift-ingress namespace with 5s, try to downgrade the cluster to 4.10, expect the timeout will change to the default 1s.
      But more than 5 hours has passed, it is still in "waiting on ingress"

      OpenShift release version:

      Cluster Platform:
      cluster access info: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/96936/

      How reproducible:
      configure timeout of liveness probe and readiness probe, and then downgrade the cluster

      Steps to Reproduce (in detail):
      1. configure timeout of liveness probe and readiness probe
      % oc -n openshift-ingress patch deploy/router-default --type=strategic --patch='{"spec":{"template":{"spec":{"containers":[{"name":"router","livenessProbe":

      {"timeoutSeconds":5}

      ,"readinessProbe":{"timeoutSeconds":5}}]}}}}'
      Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "router" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "router" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "router" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "router" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
      deployment.apps/router-default patched
      %

      2. check the configuration of timeout of liveness probe and readiness probe
      % oc -n openshift-ingress get deploy/router-default -o yaml | grep -A8 nessProbe:
      livenessProbe:
      failureThreshold: 3
      httpGet:
      path: /healthz
      port: 1936
      scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5

      readinessProbe:
      failureThreshold: 3
      httpGet:
      path: /healthz/ready
      port: 1936
      scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
      %

      3. downgrade the cluster to 4.10.0-0.nightly-2022-04-24-083512
      % oc patch clusterversion/version --patch '{"spec":{"upstream":"https://amd64.ocp.releases.ci.openshift.org/graph"}}' --type=merge
      clusterversion.config.openshift.io/version patched
      %

      % oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-04-24-083512 --allow-explicit-upgrade=true --force
      warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
      warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
      warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
      Updating to release image registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-04-24-083512
      %

      4. oc get clusterversion from time to time, it seems the downgrade is stuck in "waiting on ingress"
      % oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.11.0-0.nightly-2022-04-24-135651 True True 3m39s Working towards 4.10.0-0.nightly-2022-04-24-083512: 95 of 771 done (12% complete)
      %

      % oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.11.0-0.nightly-2022-04-24-135651 True True 31m Unable to apply 4.10.0-0.nightly-2022-04-24-083512: an unknown error has occurred: MultipleErrors
      %

      % oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.11.0-0.nightly-2022-04-24-135651 True True 36m Working towards 4.10.0-0.nightly-2022-04-24-083512: 610 of 771 done (79% complete)
      %

      % oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.11.0-0.nightly-2022-04-24-135651 True True 53m Working towards 4.10.0-0.nightly-2022-04-24-083512: 611 of 771 done (79% complete), waiting on ingress
      %

      % oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.11.0-0.nightly-2022-04-24-135651 True True 5h30m Working towards 4.10.0-0.nightly-2022-04-24-083512: 611 of 771 done (79% complete), waiting on ingress
      %

      5. check the timeout, it is changed to 1s
      % oc -n openshift-ingress get deploy/router-default -o yaml | grep -A8 nessProbe:
      livenessProbe:
      failureThreshold: 3
      httpGet:
      path: /healthz
      port: 1936
      scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1

      readinessProbe:
      failureThreshold: 3
      httpGet:
      path: /healthz/ready
      port: 1936
      scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
      %

      Actual results:
      More than 5 hours passed, the downgrade hasn't been completed.

      Expected results:
      About 1 hour, the downgrade is successful.

      Impact of the problem:

      Additional info:

        • Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report. You may also mark the bug private if you wish.

              mmasters1@redhat.com Miciah Masters
              shudili@redhat.com Shudi Li
              None
              None
              Shudi Li Shudi Li
              None
              Red Hat Employee
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: