-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.14
-
None
-
No
-
SDN Sprint 234
-
1
-
Rejected
-
False
-
Description of problem:
After a replace upgrade from OCP 4.14 image to another 4.14 image first node is in NotReady. jiezhao-mac:hypershift jiezhao$ oc get node --kubeconfig=hostedcluster.kubeconfig NAME STATUS ROLES AGE VERSION ip-10-0-128-175.us-east-2.compute.internal Ready worker 72m v1.26.2+06e8c46 ip-10-0-134-164.us-east-2.compute.internal Ready worker 68m v1.26.2+06e8c46 ip-10-0-137-194.us-east-2.compute.internal Ready worker 77m v1.26.2+06e8c46 ip-10-0-141-231.us-east-2.compute.internal NotReady worker 9m54s v1.26.2+06e8c46 - lastHeartbeatTime: "2023-03-21T19:48:46Z" lastTransitionTime: "2023-03-21T19:42:37Z" message: 'container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?' reason: KubeletNotReady status: "False" type: Ready Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Starting 11m kubelet Starting kubelet. Normal NodeHasSufficientMemory 11m (x2 over 11m) kubelet Node ip-10-0-141-231.us-east-2.compute.internal status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 11m (x2 over 11m) kubelet Node ip-10-0-141-231.us-east-2.compute.internal status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 11m (x2 over 11m) kubelet Node ip-10-0-141-231.us-east-2.compute.internal status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 11m kubelet Updated Node Allocatable limit across pods Normal Synced 11m cloud-node-controller Node synced successfully Normal RegisteredNode 11m node-controller Node ip-10-0-141-231.us-east-2.compute.internal event: Registered Node ip-10-0-141-231.us-east-2.compute.internal in Controller Warning ErrorReconcilingNode 17s (x30 over 11m) controlplane nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
ovnkube-master log:
I0321 20:55:16.270197 1 default_network_controller.go:667] Node add failed for ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation I0321 20:55:16.270209 1 obj_retry.go:326] Retry add failed for *v1.Node ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation I0321 20:55:16.270273 1 event.go:285] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-141-231.us-east-2.compute.internal", UID:"621e6289-ca5a-4e17-afff-5b49961cfb38", APIVersion:"v1", ResourceVersion:"52970", FieldPath:""}): type: 'Warning' reason: 'ErrorReconcilingNode' nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation I0321 20:55:17.851497 1 master.go:719] Adding or Updating Node "ip-10-0-137-194.us-east-2.compute.internal" I0321 20:55:25.965132 1 master.go:719] Adding or Updating Node "ip-10-0-128-175.us-east-2.compute.internal" I0321 20:55:45.928694 1 client.go:783] "msg"="transacting operations" "database"="OVN_Northbound" "operations"="[{Op:update Table:NB_Global Row:map[options:{GoMap:map[e2e_timestamp:1679432145 mac_prefix:2e:f9:d8 max_tunid:16711680 northd_internal_version:23.03.1-20.27.0-70.6 northd_probe_interval:5000 svc_monitor_mac:fe:cb:72:cf:f8:5f use_logical_dp_groups:true]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {c8b24290-296e-44a2-a4d0-02db7e312614}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]" I0321 20:55:46.270129 1 obj_retry.go:265] Retry object setup: *v1.Node ip-10-0-141-231.us-east-2.compute.internal I0321 20:55:46.270154 1 obj_retry.go:319] Adding new object: *v1.Node ip-10-0-141-231.us-east-2.compute.internal I0321 20:55:46.270164 1 master.go:719] Adding or Updating Node "ip-10-0-141-231.us-east-2.compute.internal" I0321 20:55:46.270201 1 default_network_controller.go:667] Node add failed for ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation I0321 20:55:46.270209 1 obj_retry.go:326] Retry add failed for *v1.Node ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation I0321 20:55:46.270284 1 event.go:285] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-141-231.us-east-2.compute.internal", UID:"621e6289-ca5a-4e17-afff-5b49961cfb38", APIVersion:"v1", ResourceVersion:"52970", FieldPath:""}): type: 'Warning' reason: 'ErrorReconcilingNode' nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation I0321 20:55:52.916512 1 reflector.go:559] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Namespace total 5 items received I0321 20:56:06.910669 1 reflector.go:559] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Pod total 12 items received I0321 20:56:15.928505 1 client.go:783] "msg"="transacting operations" "database"="OVN_Northbound" "operations"="[{Op:update Table:NB_Global Row:map[options:{GoMap:map[e2e_timestamp:1679432175 mac_prefix:2e:f9:d8 max_tunid:16711680 northd_internal_version:23.03.1-20.27.0-70.6 northd_probe_interval:5000 svc_monitor_mac:fe:cb:72:cf:f8:5f use_logical_dp_groups:true]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {c8b24290-296e-44a2-a4d0-02db7e312614}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]" I0321 20:56:16.269611 1 obj_retry.go:265] Retry object setup: *v1.Node ip-10-0-141-231.us-east-2.compute.internal I0321 20:56:16.269637 1 obj_retry.go:319] Adding new object: *v1.Node ip-10-0-141-231.us-east-2.compute.internal I0321 20:56:16.269646 1 master.go:719] Adding or Updating Node "ip-10-0-141-231.us-east-2.compute.internal" I0321 20:56:16.269688 1 default_network_controller.go:667] Node add failed for ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation I0321 20:56:16.269697 1 obj_retry.go:326] Retry add failed for *v1.Node ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation I0321 20:56:16.269724 1 event.go:285] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-141-231.us-east-2.compute.internal", UID:"621e6289-ca5a-4e17-afff-5b49961cfb38", APIVersion:"v1", ResourceVersion:"52970", FieldPath:""}): type: 'Warning' reason: 'ErrorReconcilingNode' nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
cluster-network-operator log:
I0321 21:03:38.487602 1 log.go:198] Set operator conditions: - lastTransitionTime: "2023-03-21T17:39:21Z" status: "False" type: ManagementStateDegraded - lastTransitionTime: "2023-03-21T19:53:10Z" message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2023-03-21T19:42:39Z reason: RolloutHung status: "True" type: Degraded - lastTransitionTime: "2023-03-21T17:39:21Z" status: "True" type: Upgradeable - lastTransitionTime: "2023-03-21T19:42:39Z" message: |- DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes) DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes) DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes) DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes) reason: Deploying status: "True" type: Progressing - lastTransitionTime: "2023-03-21T17:39:26Z" status: "True" type: Available I0321 21:03:38.488312 1 log.go:198] Skipping reconcile of Network.operator.openshift.io: spec unchanged I0321 21:03:38.499825 1 log.go:198] Set ClusterOperator conditions: - lastTransitionTime: "2023-03-21T17:39:21Z" status: "False" type: ManagementStateDegraded - lastTransitionTime: "2023-03-21T19:53:10Z" message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2023-03-21T19:42:39Z reason: RolloutHung status: "True" type: Degraded - lastTransitionTime: "2023-03-21T17:39:21Z" status: "True" type: Upgradeable - lastTransitionTime: "2023-03-21T19:42:39Z" message: |- DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes) DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes) DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes) DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes) reason: Deploying status: "True" type: Progressing - lastTransitionTime: "2023-03-21T17:39:26Z" status: "True" type: Available I0321 21:03:38.571013 1 log.go:198] Set HostedControlPlane conditions: - lastTransitionTime: "2023-03-21T17:38:24Z" message: All is well observedGeneration: 3 reason: AsExpected status: "True" type: ValidAWSIdentityProvider - lastTransitionTime: "2023-03-21T17:37:06Z" message: Configuration passes validation observedGeneration: 3 reason: AsExpected status: "True" type: ValidHostedControlPlaneConfiguration - lastTransitionTime: "2023-03-21T19:24:24Z" message: "" observedGeneration: 3 reason: QuorumAvailable status: "True" type: EtcdAvailable - lastTransitionTime: "2023-03-21T17:38:23Z" message: Kube APIServer deployment is available observedGeneration: 3 reason: AsExpected status: "True" type: KubeAPIServerAvailable - lastTransitionTime: "2023-03-21T20:26:29Z" message: "" observedGeneration: 3 reason: AsExpected status: "False" type: Degraded - lastTransitionTime: "2023-03-21T17:37:11Z" message: All is well observedGeneration: 3 reason: AsExpected status: "True" type: InfrastructureReady - lastTransitionTime: "2023-03-21T17:37:06Z" message: External DNS is not configured observedGeneration: 3 reason: StatusUnknown status: Unknown type: ExternalDNSReachable - lastTransitionTime: "2023-03-21T19:24:24Z" message: "" observedGeneration: 3 reason: AsExpected status: "True" type: Available - lastTransitionTime: "2023-03-21T17:37:06Z" message: Reconciliation active on resource observedGeneration: 3 reason: AsExpected status: "True" type: ReconciliationActive - lastTransitionTime: "2023-03-21T17:38:25Z" message: All is well reason: AsExpected status: "True" type: AWSDefaultSecurityGroupCreated - lastTransitionTime: "2023-03-21T19:30:54Z" message: 'Error while reconciling 4.14.0-0.nightly-2023-03-20-201450: the cluster operator network is degraded' observedGeneration: 3 reason: ClusterOperatorDegraded status: "False" type: ClusterVersionProgressing - lastTransitionTime: "2023-03-21T17:39:11Z" message: Condition not found in the CVO. observedGeneration: 3 reason: StatusUnknown status: Unknown type: ClusterVersionUpgradeable - lastTransitionTime: "2023-03-21T17:44:05Z" message: Done applying 4.14.0-0.nightly-2023-03-20-201450 observedGeneration: 3 reason: FromClusterVersion status: "True" type: ClusterVersionAvailable - lastTransitionTime: "2023-03-21T19:55:15Z" message: Cluster operator network is degraded observedGeneration: 3 reason: ClusterOperatorDegraded status: "True" type: ClusterVersionFailing - lastTransitionTime: "2023-03-21T17:39:11Z" message: Payload loaded version="4.14.0-0.nightly-2023-03-20-201450" image="registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-03-20-201450" architecture="amd64" observedGeneration: 3 reason: PayloadLoaded status: "True" type: ClusterVersionReleaseAccepted - lastTransitionTime: "2023-03-21T17:39:21Z" message: "" reason: AsExpected status: "False" type: network.operator.openshift.io/ManagementStateDegraded - lastTransitionTime: "2023-03-21T19:53:10Z" message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2023-03-21T19:42:39Z reason: RolloutHung status: "True" type: network.operator.openshift.io/Degraded - lastTransitionTime: "2023-03-21T17:39:21Z" message: "" reason: AsExpected status: "True" type: network.operator.openshift.io/Upgradeable - lastTransitionTime: "2023-03-21T19:42:39Z" message: |- DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes) DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes) DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes) DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes) reason: Deploying status: "True" type: network.operator.openshift.io/Progressing - lastTransitionTime: "2023-03-21T17:39:27Z" message: "" reason: AsExpected status: "True" type: network.operator.openshift.io/Available I0321 21:03:39.450912 1 pod_watcher.go:125] Operand /, Kind= openshift-multus/multus updated, re-generating status I0321 21:03:39.450953 1 pod_watcher.go:125] Operand /, Kind= openshift-multus/multus updated, re-generating status I0321 21:03:39.493206 1 log.go:198] Set operator conditions: - lastTransitionTime: "2023-03-21T17:39:21Z" status: "False" type: ManagementStateDegraded - lastTransitionTime: "2023-03-21T19:53:10Z" message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2023-03-21T19:42:39Z reason: RolloutHung status: "True" type: Degraded - lastTransitionTime: "2023-03-21T17:39:21Z" status: "True" type: Upgradeable - lastTransitionTime: "2023-03-21T19:42:39Z" message: |- DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes) DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes) DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes) reason: Deploying status: "True" type: Progressing - lastTransitionTime: "2023-03-21T17:39:26Z" status: "True" type: Available I0321 21:03:39.494050 1 log.go:198] Skipping reconcile of Network.operator.openshift.io: spec unchanged I0321 21:03:39.508538 1 log.go:198] Set ClusterOperator conditions: - lastTransitionTime: "2023-03-21T17:39:21Z" status: "False" type: ManagementStateDegraded - lastTransitionTime: "2023-03-21T19:53:10Z" message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2023-03-21T19:42:39Z reason: RolloutHung status: "True" type: Degraded - lastTransitionTime: "2023-03-21T17:39:21Z" status: "True" type: Upgradeable - lastTransitionTime: "2023-03-21T19:42:39Z" message: |- DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes) DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes) DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes) reason: Deploying status: "True" type: Progressing - lastTransitionTime: "2023-03-21T17:39:26Z" status: "True" type: Available I0321 21:03:39.684429 1 log.go:198] Set HostedControlPlane conditions: - lastTransitionTime: "2023-03-21T17:38:24Z" message: All is well observedGeneration: 3 reason: AsExpected status: "True" type: ValidAWSIdentityProvider - lastTransitionTime: "2023-03-21T17:37:06Z" message: Configuration passes validation observedGeneration: 3 reason: AsExpected status: "True" type: ValidHostedControlPlaneConfiguration - lastTransitionTime: "2023-03-21T19:24:24Z" message: "" observedGeneration: 3 reason: QuorumAvailable status: "True" type: EtcdAvailable - lastTransitionTime: "2023-03-21T17:38:23Z" message: Kube APIServer deployment is available observedGeneration: 3 reason: AsExpected status: "True" type: KubeAPIServerAvailable - lastTransitionTime: "2023-03-21T20:26:29Z" message: "" observedGeneration: 3 reason: AsExpected status: "False" type: Degraded - lastTransitionTime: "2023-03-21T17:37:11Z" message: All is well observedGeneration: 3 reason: AsExpected status: "True" type: InfrastructureReady - lastTransitionTime: "2023-03-21T17:37:06Z" message: External DNS is not configured observedGeneration: 3 reason: StatusUnknown status: Unknown type: ExternalDNSReachable - lastTransitionTime: "2023-03-21T19:24:24Z" message: "" observedGeneration: 3 reason: AsExpected status: "True" type: Available - lastTransitionTime: "2023-03-21T17:37:06Z" message: Reconciliation active on resource observedGeneration: 3 reason: AsExpected status: "True" type: ReconciliationActive - lastTransitionTime: "2023-03-21T17:38:25Z" message: All is well reason: AsExpected status: "True" type: AWSDefaultSecurityGroupCreated - lastTransitionTime: "2023-03-21T19:30:54Z" message: 'Error while reconciling 4.14.0-0.nightly-2023-03-20-201450: the cluster operator network is degraded' observedGeneration: 3 reason: ClusterOperatorDegraded status: "False" type: ClusterVersionProgressing - lastTransitionTime: "2023-03-21T17:39:11Z" message: Condition not found in the CVO. observedGeneration: 3 reason: StatusUnknown status: Unknown type: ClusterVersionUpgradeable - lastTransitionTime: "2023-03-21T17:44:05Z" message: Done applying 4.14.0-0.nightly-2023-03-20-201450 observedGeneration: 3 reason: FromClusterVersion status: "True" type: ClusterVersionAvailable - lastTransitionTime: "2023-03-21T19:55:15Z" message: Cluster operator network is degraded observedGeneration: 3 reason: ClusterOperatorDegraded status: "True" type: ClusterVersionFailing - lastTransitionTime: "2023-03-21T17:39:11Z" message: Payload loaded version="4.14.0-0.nightly-2023-03-20-201450" image="registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-03-20-201450" architecture="amd64" observedGeneration: 3 reason: PayloadLoaded status: "True" type: ClusterVersionReleaseAccepted - lastTransitionTime: "2023-03-21T17:39:21Z" message: "" reason: AsExpected status: "False" type: network.operator.openshift.io/ManagementStateDegraded - lastTransitionTime: "2023-03-21T19:53:10Z" message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2023-03-21T19:42:39Z reason: RolloutHung status: "True" type: network.operator.openshift.io/Degraded - lastTransitionTime: "2023-03-21T17:39:21Z" message: "" reason: AsExpected status: "True" type: network.operator.openshift.io/Upgradeable - lastTransitionTime: "2023-03-21T19:42:39Z" message: |- DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes) DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes) DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes) reason: Deploying status: "True" type: network.operator.openshift.io/Progressing - lastTransitionTime: "2023-03-21T17:39:27Z" message: "" reason: AsExpected status: "True" type: network.operator.openshift.io/Available
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. management cluster 4.13 2. bring up the hostedcluster and nodepool in 4.14.0-0.nightly-2023-03-19-234132 3. upgrade the hostedcluster to 4.14.0-0.nightly-2023-03-20-201450 4. replace upgrade the nodepool to 4.14.0-0.nightly-2023-03-20-201450
Actual results
First node is in NotReady
Expected results:
All nodes should be Ready
Additional info:
No issue with replace upgrade from 4.13 to 4.14
- blocks
-
OCPBUGS-10890 Hypershift replace upgrade: node in NotReady after upgrading from a 4.14 image to another 4.14 image
- Closed
- links to
-
RHEA-2023:5006 rpm