-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.14.z
-
Important
-
None
-
False
-
Description of problem:
EgressIPs are not assigned to nodes following node scale down / scale up operation. The worker nodes are AWS m6a.16xlarge EC2 instances which can host up to 49 EgressIPs per node. Annotations: cloud.network.openshift.io/egress-ipconfig: [{"interface":"eni-xxxx","ifaddr":{"ipv4":"10.x.x.0/18"},"capacity":{"ipv4":49,"ipv6":50}}]
Version-Release number of selected component (if applicable):
EgressIPs with OVN-Kubernetes
Steps to Reproduce:
1. Create a worker machineset using m6a.16xlarge instance type 2. Create 76 eips for n in $(seq 5 80); do cat <<EOF |oc apply -f - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egress-10.0.40.${n} spec: egressIPs: - 10.0.40.$n namespaceSelector: matchLabels: env: eip EOF done 3. Scale down the machineset to 0, and scale up to the previous number of instances
Actual results:
1. For a while there are cloudprivateipconfigs in "CloudResponsePending" state, but eventually most succeed: $ oc get cloudprivateipconfigs -o yaml|yq '.items[].status.conditions[].reason'|sort|uniq -c 76 CloudResponseSucces 2. However 3 egressips remain unassigned $ oc get eip --no-headers|awk '$3 == "" {print}' egress-10.0.40.10 10.0.40.10 egress-10.0.40.13 10.0.40.13 egress-10.0.40.15 10.0.40.15 3. cloud-config-controller hits context deadline exceeded for 40.15, which is not assigned (see above): $ oc logs -n openshift-cloud-network-config-controller deploy/cloud-network-config-controller |grep 40.15
I1206 13:20:18.154263 1 controller.go:182] Assigning key: 10.0.40.15 to cloud-private-ip-config workqueue I1206 13:20:18.240415 1 controller.go:182] Assigning key: ip-10-0-44-127.eu-central-1.compute.internal to node workqueue I1206 13:20:20.695067 1 cloudprivateipconfig_controller.go:439] Added IP address to node: "ip-10-0-32-145.eu-central-1.compute.internal" for CloudPrivateIPConfig: "10.0.40.15" I1206 13:20:22.342670 1 controller.go:160] Dropping key '10.0.40.15' from the cloud-private-ip-config workqueue E1206 13:20:49.140656 1 controller.go:165] error syncing '10.0.40.15': Get "https://api-int.bverschu.emea.aws.cee.support:6443/apis/cloud.network.openshift.io/v1/cloudprivateipconfigs/10.0.40.15": context deadline exceeded, requeuing in cloud-private-ip-config workqueue E1206 13:21:10.340015 1 controller.go:165] error syncing '10.0.40.68': error updating CloudPrivateIPConfig: "10.0.40.68", err: Put "https://api-int.bverschu.emea.aws.cee.support:6443/apis/cloud.network.openshift.io/v1/cloudprivateipconfigs/10.0.40.68/status": context deadline exceeded, requeuing in cloud-private-ip-config workqueue E1206 13:21:14.539786 1 controller.go:165] error syncing '10.0.40.15': Get "https://api-int.bverschu.emea.aws.cee.support:6443/apis/cloud.network.openshift.io/v1/cloudprivateipconfigs/10.0.40.15": context deadline exceeded, requeuing in cloud-private-ip-config workqueue I1206 13:21:39.339112 1 controller.go:160] Dropping key '10.0.40.15' from the cloud-private-ip-config workqueue
Expected results:
All EgressIPs are assigned to nodes.
Additional info: