-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.19.z, 4.20.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Low
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When upgrading a single node OpenShift (SNO) cluster from 4.19 to 4.20, the upgrade gets stuck in the network cluster operator. It looks like the new frr-k8s-webhook-server pod can't be scheduled because there are no free ports in the node.
Version-Release number of selected component (if applicable):
OCP 4.19.16 upgrade to 4.20.1
How reproducible:
Always
Steps to Reproduce:
1. Have a SNO cluster in version 4.19
2. Upgrade to 4.20
Actual results:
Upgrade is blocked: $ oc get co network NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE network 4.19.16 True True False 111d Deployment "/openshift-frr-k8s/frr-k8s-webhook-server" update is rolling out (1 out of 2 updated) New pod is pending: $ oc get pod -n openshift-frr-k8s NAME READY STATUS RESTARTS AGE frr-k8s-fnqzh 7/7 Running 0 35m frr-k8s-webhook-server-bc5694f79-w49mc 0/1 Pending 0 10m frr-k8s-webhook-server-bc7675998-s77lz 1/1 Running 2 14d Deployment uses RollingUpdate strategy: $ oc get deployment -n openshift-frr-k8s frr-k8s-webhook-server -o yaml spec: replicas: 1 strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate Pod is not scheduled due to lack of node ports: $ oc describe pod -n openshift-frr-k8s frr-k8s-webhook-server-bc5694f79-w49mc Name: frr-k8s-webhook-server-bc5694f79-w49mc Namespace: openshift-frr-k8s Priority: 2000000000 Priority Class Name: system-cluster-critical Service Account: frr-k8s-daemon Node: <none> Labels: component=frr-k8s-webhook-server pod-template-hash=bc5694f79 Annotations: openshift.io/required-scc: privileged openshift.io/scc: privileged security.openshift.io/validated-scc-subject-type: serviceaccount Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/frr-k8s-webhook-server-bc5694f79 Containers: frr-k8s-webhook-server: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6a974f04d4aefdb39bf2d4649b24e7e0e87685afa3d07ca46234f1a0c5688e4b Port: 9123/TCP Host Port: 9123/TCP Command: /frr-k8s Args: --log-level=info --webhook-mode=onlywebhook --disable-cert-rotation=true --namespace=$(NAMESPACE) --webhook-port=9123 Requests: cpu: 10m memory: 50Mi Liveness: http-get https://:webhook/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get https://:webhook/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: NAMESPACE: openshift-frr-k8s (v1:metadata.namespace) Mounts: /tmp/k8s-webhook-server/serving-certs from cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-q5tpd (ro) Conditions: Type Status PodScheduled False Volumes: cert: Type: Secret (a volume populated by a Secret) SecretName: frr-k8s-webhook-server-cert Optional: false kube-api-access-q5tpd: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt Optional: false DownwardAPI: true ConfigMapName: openshift-service-ca.crt Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node-role.kubernetes.io/control-plane:NoSchedule op=Exists node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 10m default-scheduler 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. Warning FailedScheduling 17s (x2 over 5m17s) default-scheduler 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.
Expected results:
No manual intervention required during the upgrade.
Additional info:
Deleting the running frr-k8s-webhook-server pod unblocks the upgrade.