Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.12.0
Component/s: Networking / router
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
1
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
Sprint 227
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

The issue was found by running e2e automation cases, and it was also reproduced by running the script locally.
Created an ingress-controller and a router pod will be created on the sno aws cluster. Then update the timeout of the liveness probe and readiness probe and set it to 5s by the oc patch deploy/router-ocp50074 command. Expect a new router pod will be created and old router pod will be deleted, but actually the old router pod isn't deleted and the new router pod is pending status

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2022-09-14-101116 with profile 79_sno-disconnected-ipi-aws-fips_off in jenkins

How reproducible:

create an ingress-controller and patch the timeout to its deployment

Steps to Reproduce:

1. create an ingress-controller

% oc -n openshift-ingress-operator get ingresscontroller ocp50074 -o yaml
spec:
  clientTLS:
    clientCA:
      name: ""
    clientCertificatePolicy: ""
  defaultCertificate:
    name: router-certs-default
  domain: ocp50074.shudi-412test011.qe.devcluster.openshift.com
  endpointPublishingStrategy:
    type: NodePortService
  httpCompression: {}
  httpEmptyRequestsPolicy: Respond
  httpErrorCodePages:
    name: ""
  replicas: 1
  tuningOptions:
    reloadInterval: 0s
  unsupportedConfigOverrides: null
2. 
% oc -n openshift-ingress get pods
NAME                              READY   STATUS    RESTARTS      AGE
router-default-59488d68f7-km8x7   1/1     Running   1 (67m ago)   70m
router-ocp50074-75d744544-7kvn7   1/1     Running   0             36s
%

3. patch the timeout with 5s to the deployment
oc -n openshift-ingress patch deploy/router-ocp50074 --type=strategic --patch='{"spec":{"template":{"spec":{"containers":[{"name":"router","livenessProbe":{"timeoutSeconds":5},"readinessProbe":{"timeoutSeconds":5}}]}}}}' 

4. check the pods
% oc -n openshift-ingress get pods
NAME                              READY   STATUS    RESTARTS      AGE
router-default-59488d68f7-km8x7   1/1     Running   1 (67m ago)   70m
router-ocp50074-75d744544-7kvn7   1/1     Running   0             40s
router-ocp50074-dc4fdf47b-fk7js   0/1     Pending   0             1s
% 

5. After more than 30 minutes have been passed, router-ocp50074-75d744544-7kvn7 pod isn't deleted yet
% oc -n openshift-ingress get pods                                       
NAME                              READY   STATUS    RESTARTS       AGE
router-default-59488d68f7-km8x7   1/1     Running   1 (106m ago)   109m
router-ocp50074-75d744544-7kvn7   1/1     Running   0              39m
router-ocp50074-dc4fdf47b-fk7js   0/1     Pending   0              39m
%

Actual results:

old pod router-ocp50074-75d744544-7kvn7 wasn't deleted

Expected results:

old pod router-ocp50074-75d744544-7kvn7 was deleted, and router-ocp50074-dc4fdf47b-fk7js was in running status

Additional info:

cluster info: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/138597/

kubeconfig: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/138597/artifact/workdir/install-dir/auth/kubeconfig

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

haproxy-router.go
62 kB
2022/09/15 11:08 AM

relates to

OCPBUGS-2557 router-perf routes can't be accessed after scaling up cluster on AWS and GCP

Closed

Assignee:: Grant Spence

Reporter:: Shudi Li

QA Contact:: Shudi Li

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2022/09/15 11:07 AM

Updated:: 2025/07/29 5:50 AM

Resolved:: 2022/11/10 9:09 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates