Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1356

After patched an ingress-controller of a sno aws cluster with livenessProbe timeout, the old router replica is pending termination

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Normal Normal
    • None
    • 4.12.0
    • Networking / router
    • None
    • Important
    • None
    • 1
    • Sprint 227
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      The issue was found by running e2e automation cases, and it was also reproduced by running the script locally.
      Created an ingress-controller and a router pod will be created on the sno aws cluster. Then update the timeout of the liveness probe and readiness probe and set it to 5s by the oc patch deploy/router-ocp50074 command. Expect a new router pod will be created and old router pod will be deleted, but actually the old router pod isn't deleted and the new router pod is pending status

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-2022-09-14-101116 with profile 79_sno-disconnected-ipi-aws-fips_off in jenkins

      How reproducible:

      create an ingress-controller and patch the timeout to its deployment

      Steps to Reproduce:

      1. create an ingress-controller
      % oc -n openshift-ingress-operator get ingresscontroller ocp50074 -o yaml
      spec:
        clientTLS:
          clientCA:
            name: ""
          clientCertificatePolicy: ""
        defaultCertificate:
          name: router-certs-default
        domain: ocp50074.shudi-412test011.qe.devcluster.openshift.com
        endpointPublishingStrategy:
          type: NodePortService
        httpCompression: {}
        httpEmptyRequestsPolicy: Respond
        httpErrorCodePages:
          name: ""
        replicas: 1
        tuningOptions:
          reloadInterval: 0s
        unsupportedConfigOverrides: null
      2. 
      % oc -n openshift-ingress get pods
      NAME                              READY   STATUS    RESTARTS      AGE
      router-default-59488d68f7-km8x7   1/1     Running   1 (67m ago)   70m
      router-ocp50074-75d744544-7kvn7   1/1     Running   0             36s
      %
      
      3. patch the timeout with 5s to the deployment
      oc -n openshift-ingress patch deploy/router-ocp50074 --type=strategic --patch='{"spec":{"template":{"spec":{"containers":[{"name":"router","livenessProbe":{"timeoutSeconds":5},"readinessProbe":{"timeoutSeconds":5}}]}}}}' 
      
      4. check the pods
      % oc -n openshift-ingress get pods
      NAME                              READY   STATUS    RESTARTS      AGE
      router-default-59488d68f7-km8x7   1/1     Running   1 (67m ago)   70m
      router-ocp50074-75d744544-7kvn7   1/1     Running   0             40s
      router-ocp50074-dc4fdf47b-fk7js   0/1     Pending   0             1s
      % 
      
      5. After more than 30 minutes have been passed, router-ocp50074-75d744544-7kvn7 pod isn't deleted yet
      % oc -n openshift-ingress get pods                                       
      NAME                              READY   STATUS    RESTARTS       AGE
      router-default-59488d68f7-km8x7   1/1     Running   1 (106m ago)   109m
      router-ocp50074-75d744544-7kvn7   1/1     Running   0              39m
      router-ocp50074-dc4fdf47b-fk7js   0/1     Pending   0              39m
      %

      Actual results:

      old pod router-ocp50074-75d744544-7kvn7 wasn't deleted

      Expected results:

      old pod router-ocp50074-75d744544-7kvn7 was deleted, and router-ocp50074-dc4fdf47b-fk7js was in running status

      Additional info:

      cluster info: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/138597/
      
      kubeconfig: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/138597/artifact/workdir/install-dir/auth/kubeconfig

              gspence@redhat.com Grant Spence
              shudili@redhat.com Shudi Li
              Shudi Li Shudi Li
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: