-
Bug
-
Resolution: Done-Errata
-
Major
-
None
-
4.12.z
-
Moderate
-
No
-
SDN Sprint 235
-
1
-
Rejected
-
False
-
Description of problem:
The master node meet annotation not found issue when upgrade from 4.12 to 4.13. This block the upgrade process. Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ErrorReconcilingNode 3h12m (x2 over 3h12m) controlplane [k8s.ovn.org/node-chassis-id annotation not found for node ip-10-0-207-132.ec2.internal, macAddress annotation not found for node "ip-10-0-207-132.ec2.internal" , k8s.ovn.org/l3-gateway-config annotation not found for node "ip-10-0-207-132.ec2.internal"] Warning ErrorReconcilingNode 3h12m controlplane error creating gateway for node ip-10-0-207-132.ec2.internal: failed to init shared interface gateway: failed to create MAC Binding for dummy nexthop ip-10-0-207-132.ec2.internal: error getting datapath GR_ip-10-0-207-132.ec2.internal: object not found Warning ErrorReconcilingNode 41m (x10 over 52m) controlplane error creating gateway for node ip-10-0-207-132.ec2.internal: failed to init shared interface gateway: failed to sync stale SNATs on node ip-10-0-207-132.ec2.internal: unable to fetch podIPs for pod openshift-cluster-machine-approver/machine-approver-7747c58fc8-rbwcf Warning ErrorReconcilingNode 36m (x5 over 50m) controlplane error creating gateway for node ip-10-0-207-132.ec2.internal: failed to init shared interface gateway: failed to sync stale SNATs on node ip-10-0-207-132.ec2.internal: unable to fetch podIPs for pod openshift-cloud-controller-manager-operator/cluster-cloud-controller-manager-operator-64f55669d9-9p6x4
Version-Release number of selected component (if applicable):
4.12.0-0.nightly-arm64-2023-04-04-094521 -> 4.13.0-0.nightly-arm64-2023-04-04-091502 (profile: 03_aarch64_Disconnected IPI on AWS & HTTP_PROXY & OVN IPSec) Also hit two times for arm64 build when upgrade from 4.12 to 4.13, and the profile are: aws-ipi-ovn-ipsec azure-ipi-ovn-ipsec
How reproducible:
Now we hit 3 times
Steps to Reproduce:
1.upgrade a cluster from 4.12 to 4.13 2. 3.
Actual results:
1 the upgrade failed due to timeout, because the master node meet annotation not found issue, and the kubelet became down finally, which lead to the mcp master can't finish rolling out. Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeHasNoDiskPressure 3h14m (x8 over 3h14m) kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeHasNoDiskPressure Normal NodeHasSufficientMemory 3h14m (x8 over 3h14m) kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeHasSufficientMemory Normal RegisteredNode 3h14m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Warning ErrorReconcilingNode 3h12m (x2 over 3h12m) controlplane [k8s.ovn.org/node-chassis-id annotation not found for node ip-10-0-207-132.ec2.internal, macAddress annotation not found for node "ip-10-0-207-132.ec2.internal" , k8s.ovn.org/l3-gateway-config annotation not found for node "ip-10-0-207-132.ec2.internal"] Warning ErrorReconcilingNode 3h12m controlplane error creating gateway for node ip-10-0-207-132.ec2.internal: failed to init shared interface gateway: failed to create MAC Binding for dummy nexthop ip-10-0-207-132.ec2.internal: error getting datapath GR_ip-10-0-207-132.ec2.internal: object not found Normal Uncordon 3h11m machineconfigdaemon Update completed for config rendered-master-b7734793514e431d2be272b7cc277200 and node has been uncordoned Normal NodeDone 3h11m machineconfigdaemon Setting node ip-10-0-207-132.ec2.internal, currentConfig rendered-master-b7734793514e431d2be272b7cc277200 to Done Normal ConfigDriftMonitorStarted 3h11m machineconfigdaemon Config Drift Monitor started, watching against rendered-master-b7734793514e431d2be272b7cc277200 Normal RegisteredNode 3h6m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Normal RegisteredNode 3h6m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Normal RegisteredNode 3h2m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Normal RegisteredNode 178m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Normal RegisteredNode 177m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Normal RegisteredNode 174m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Normal OSUpdateStaged 173m machineconfigdaemon Changes to OS staged Normal PendingConfig 173m machineconfigdaemon Written pending config rendered-master-8bf64f293f301538822add4805050730 Normal SkipReboot 173m machineconfigdaemon Config changes do not require reboot. Normal Uncordon 173m machineconfigdaemon Update completed for config rendered-master-8bf64f293f301538822add4805050730 and node has been uncordoned Normal NodeDone 173m machineconfigdaemon Setting node ip-10-0-207-132.ec2.internal, currentConfig rendered-master-8bf64f293f301538822add4805050730 to Done Normal ConfigDriftMonitorStarted 173m machineconfigdaemon Config Drift Monitor started, watching against rendered-master-8bf64f293f301538822add4805050730 Normal OSUpdateStaged 171m machineconfigdaemon Changes to OS staged Normal PendingConfig 171m machineconfigdaemon Written pending config rendered-master-786aa385eb74bdb14138faccccd9e82b Normal Uncordon 171m machineconfigdaemon Update completed for config rendered-master-786aa385eb74bdb14138faccccd9e82b and node has been uncordoned Normal NodeDone 171m machineconfigdaemon Setting node ip-10-0-207-132.ec2.internal, currentConfig rendered-master-786aa385eb74bdb14138faccccd9e82b to Done Normal ConfigDriftMonitorStarted 171m machineconfigdaemon Config Drift Monitor started, watching against rendered-master-786aa385eb74bdb14138faccccd9e82b Normal SkipReboot 159m (x2 over 171m) machineconfigdaemon Config changes do not require reboot. Service crio was reloaded. Normal OSUpdateStaged 159m machineconfigdaemon Changes to OS staged Normal PendingConfig 159m machineconfigdaemon Written pending config rendered-master-6b7cd9aa5bbf5900088adda3ec249647 Normal NodeSchedulable 159m kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeSchedulable Normal ConfigDriftMonitorStarted 159m machineconfigdaemon Config Drift Monitor started, watching against rendered-master-6b7cd9aa5bbf5900088adda3ec249647 Normal Uncordon 159m machineconfigdaemon Update completed for config rendered-master-6b7cd9aa5bbf5900088adda3ec249647 and node has been uncordoned Normal NodeDone 159m machineconfigdaemon Setting node ip-10-0-207-132.ec2.internal, currentConfig rendered-master-6b7cd9aa5bbf5900088adda3ec249647 to Done Normal Drain 149m (x2 over 161m) machineconfigdaemon Draining node to update config. Normal Cordon 149m (x2 over 161m) machineconfigdaemon Cordoned node to apply update Normal ConfigDriftMonitorStopped 149m (x4 over 173m) machineconfigdaemon Config Drift Monitor stopped Normal NodeNotSchedulable 149m (x2 over 160m) kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeNotSchedulable Normal OSUpdateStaged 148m machineconfigdaemon Changes to OS staged Normal PendingConfig 148m machineconfigdaemon Written pending config rendered-master-0b26a7484328389fe70296e0e44a9465 Normal Reboot 148m machineconfigdaemon Node will reboot into config rendered-master-0b26a7484328389fe70296e0e44a9465 Normal OSUpdateStarted 148m (x4 over 173m) machineconfigdaemon Normal RegisteredNode 148m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Normal NodeNotReady 147m node-controller Node ip-10-0-207-132.ec2.internal status is now: NodeNotReady Normal NodeAllocatableEnforced 145m kubelet Updated Node Allocatable limit across pods Normal NodeReady 145m kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeReady Normal Starting 145m kubelet Starting kubelet. Normal NodeHasSufficientMemory 145m (x2 over 145m) kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 145m (x2 over 145m) kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 145m (x2 over 145m) kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeHasSufficientPID Warning Rebooted 145m kubelet Node ip-10-0-207-132.ec2.internal has been rebooted, boot id: f4598387-0bf8-4394-884f-a0866124f25a Normal NodeSchedulable 145m kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeSchedulable Normal Uncordon 145m machineconfigdaemon Update completed for config rendered-master-0b26a7484328389fe70296e0e44a9465 and node has been uncordoned Normal NodeDone 145m machineconfigdaemon Setting node ip-10-0-207-132.ec2.internal, currentConfig rendered-master-0b26a7484328389fe70296e0e44a9465 to Done Normal ConfigDriftMonitorStarted 145m machineconfigdaemon Config Drift Monitor started, watching against rendered-master-0b26a7484328389fe70296e0e44a9465 Normal RegisteredNode 136m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Normal RegisteredNode 104m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Normal RegisteredNode 102m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Normal ConfigDriftMonitorStarted 69m machineconfigdaemon Config Drift Monitor started, watching against rendered-master-0b26a7484328389fe70296e0e44a9465 Normal ConfigDriftMonitorStopped 67m machineconfigdaemon Config Drift Monitor stopped Normal Drain 67m machineconfigdaemon Draining node to update config. Normal Cordon 67m machineconfigdaemon Cordoned node to apply update Normal NodeNotSchedulable 67m (x2 over 145m) kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeNotSchedulable Normal OSUpdateStarted 66m machineconfigdaemon Upgrading OS; Changing kernel arguments Normal InClusterUpgrade 66m machineconfigdaemon Updating from oscontainer quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:792af500fec0c1ddd4888cd9acb1e990561668744f9e8a4831e9998083fe01e5 Normal OSUpgradeApplied 65m machineconfigdaemon OS upgrade applied; new MachineConfig (rendered-master-b80b40f8bbea2f6e5ba8e96f7c1ccfa1) has new OS image (quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:792af500fec0c1ddd4888cd9acb1e990561668744f9e8a4831e9998083fe01e5) Normal Reboot 65m machineconfigdaemon Node will reboot into config rendered-master-b80b40f8bbea2f6e5ba8e96f7c1ccfa1 Normal OSUpdateStaged 65m machineconfigdaemon Changes to OS staged Normal PendingConfig 65m machineconfigdaemon Written pending config rendered-master-b80b40f8bbea2f6e5ba8e96f7c1ccfa1 Normal RegisteredNode 64m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Normal Starting 62m kubelet Starting kubelet. Warning Rebooted 62m (x2 over 62m) kubelet Node ip-10-0-207-132.ec2.internal has been rebooted, boot id: 36b706c0-93f7-4219-a25c-59e4919ec5e9 Normal NodeHasNoDiskPressure 62m (x3 over 62m) kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeHasNoDiskPressure Normal NodeReady 62m kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeReady Normal NodeNotSchedulable 62m kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeNotSchedulable Normal NodeNotReady 62m (x2 over 62m) kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeNotReady Normal NodeAllocatableEnforced 62m kubelet Updated Node Allocatable limit across pods Normal NodeHasSufficientPID 62m (x3 over 62m) kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeHasSufficientPID Normal NodeHasSufficientMemory 62m (x3 over 62m) kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeHasSufficientMemory Normal ConfigDriftMonitorStarted 61m machineconfigdaemon Config Drift Monitor started, watching against rendered-master-b80b40f8bbea2f6e5ba8e96f7c1ccfa1 Normal NodeDone 61m machineconfigdaemon Setting node ip-10-0-207-132.ec2.internal, currentConfig rendered-master-b80b40f8bbea2f6e5ba8e96f7c1ccfa1 to Done Normal Uncordon 61m machineconfigdaemon Update completed for config rendered-master-b80b40f8bbea2f6e5ba8e96f7c1ccfa1 and node has been uncordoned Normal NodeSchedulable 61m kubelet Node ip-10-0-207-132.ec2.internal status is now: NodeSchedulable Normal NodeNotReady 60m (x2 over 64m) node-controller Node ip-10-0-207-132.ec2.internal status is now: NodeNotReady Normal RegisteredNode 53m node-controller Node ip-10-0-207-132.ec2.internal event: Registered Node ip-10-0-207-132.ec2.internal in Controller Warning ErrorReconcilingNode 41m (x10 over 52m) controlplane error creating gateway for node ip-10-0-207-132.ec2.internal: failed to init shared interface gateway: failed to sync stale SNATs on node ip-10-0-207-132.ec2.internal: unable to fetch podIPs for pod openshift-cluster-machine-approver/machine-approver-7747c58fc8-rbwcf Warning ErrorReconcilingNode 36m (x5 over 50m) controlplane error creating gateway for node ip-10-0-207-132.ec2.internal: failed to init shared interface gateway: failed to sync stale SNATs on node ip-10-0-207-132.ec2.internal: unable to fetch podIPs for pod openshift-cloud-controller-manager-operator/cluster-cloud-controller-manager-operator-64f55669d9-9p6x4
Expected results:
1 the upgrade succeed and the master don't prompt error.
Additional info:
the job details(include prow and jenkins job): 1)https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.13-arm64-stable-4.13-upgrade-from-stable-4.12-aws-ipi-ovn-ipsec-p1-f14/1642004799069097984 2)https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.13-arm64-nightly-4.13-upgrade-from-stable-4.12-azure-ipi-ovn-ipsec-p1-f14/1642489498921078784 3)https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-upgrade/job/upgrade-pipeline/36070/consoleFull
- blocks
-
OCPBUGS-13175 The master node meet annotation not found issue when upgrade from 4.12 to 4.13
- Closed
- is blocked by
-
SDN-3889 Impact: The master node meet annotation not found issue when upgrade from 4.12 to 4.13
- Closed
- is cloned by
-
OCPBUGS-13175 The master node meet annotation not found issue when upgrade from 4.12 to 4.13
- Closed
- links to
-
RHEA-2023:5006 rpm