-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.17
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Upgrade a cluster from 4.16 to 4.17 with one mcp paused, then enable TechPreview for featureset, the master and worker stuck in NotReady and mco report error: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io)
Version-Release number of selected component (if applicable):
upgrade from 4.16.0-0.nightly-2025-09-01-105533 to 4.17.0-0.nightly-2025-08-26-172650
How reproducible:
always
Steps to Reproduce:
1.launch a 4.16 cluster 2.create a infra mcp and add one node into this mcp % oc label node minmli-41901-qdd5b-worker-a-n7996 node-role.kubernetes.io/infra= infra_mcp.yaml: apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: infra spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]} nodeSelector: matchLabels: node-role.kubernetes.io/infra: "" 3.wait several minutes until the mcp infra finish updating, then edit the mcp infra to set paused: true 4.upgrade the cluster from 4.16 to 4.17 5.When the upgrade process finished, edit the featuregate cluster to enable TechPreview: spec: featureSet: "TechPreviewNoUpgrade" 6.edit the mcp infra to set paused: false, then the node of mcp infra start upgrading, but finally it stuck in NotReady.
Actual results:
5. The master and worker sutck in NotReady # oc get node NAME STATUS ROLES AGE VERSION minmli-090241601-tfqpl-master-0 NotReady control-plane,master 4h26m v1.30.14 minmli-090241601-tfqpl-master-1 Ready control-plane,master 4h26m v1.30.14 minmli-090241601-tfqpl-master-2 Ready control-plane,master 4h26m v1.30.14 minmli-090241601-tfqpl-worker-a-hms4k NotReady worker 4h14m v1.30.14 minmli-090241601-tfqpl-worker-b-n9q7h Ready worker 4h14m v1.30.14 minmli-090241601-tfqpl-worker-c-s924n Ready infra,worker 4h14m v1.29.14+c68a663 6. The node of infra mcp stuck in NotReady: # oc describe node minmli-090241601-tfqpl-worker-c-s924n Name: minmli-090241601-tfqpl-worker-c-s924n Roles: infra,worker .... Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Mon, 01 Sep 2025 23:00:52 -0400 Mon, 01 Sep 2025 23:01:15 -0400 RouteCreated ovn-kube cleared kubelet-set NoRouteCreated MemoryPressure False Tue, 02 Sep 2025 03:24:08 -0400 Tue, 02 Sep 2025 03:23:57 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 02 Sep 2025 03:24:08 -0400 Tue, 02 Sep 2025 03:23:57 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 02 Sep 2025 03:24:08 -0400 Tue, 02 Sep 2025 03:23:57 -0400 KubeletHasSufficientPID kubelet has sufficient PID available Ready False Tue, 02 Sep 2025 03:24:08 -0400 Tue, 02 Sep 2025 03:23:57 -0400 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Synced 3h48m cloud-node-controller Node synced successfully Normal RegisteredNode 3h48m node-controller Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller Normal RegisteredNode 3h44m node-controller Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller Normal RegisteredNode 3h41m node-controller Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller Normal RegisteredNode 167m node-controller Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller Normal RegisteredNode 164m node-controller Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller Warning ErrorAddingResource 145m controlplane macAddress annotation not found for node minmli-090241601-tfqpl-worker-c-s924n; error: could not find "k8s.ovn.org/node-mgmt-port-mac-addresses" annotation Normal RegisteredNode 135m node-controller Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller Normal RegisteredNode 122m node-controller Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller Normal RegisteredNode 61m node-controller Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller Normal NodeNotSchedulable <invalid> kubelet Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeNotSchedulable Normal OSUpdateStaged <invalid> machineconfigdaemon Changes to OS staged Normal NodeNotReady <invalid> node-controller Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeNotReady Normal Starting <invalid> kubelet Starting kubelet. Normal NodeAllocatableEnforced <invalid> kubelet Updated Node Allocatable limit across pods Normal NodeHasSufficientMemory <invalid> (x2 over <invalid>) kubelet Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure <invalid> (x2 over <invalid>) kubelet Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID <invalid> (x2 over <invalid>) kubelet Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeHasSufficientPID Warning Rebooted <invalid> kubelet Node minmli-090241601-tfqpl-worker-c-s924n has been rebooted, boot id: 9fcbe0fb-cb6f-474e-ac9e-7f6e95045d8c Normal NodeNotReady <invalid> kubelet Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeNotReady Normal NodeNotSchedulable <invalid> kubelet Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeNotSchedulable Normal NodeSchedulable <invalid> kubelet Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeSchedulable
Expected results:
5 the master and worker can update successfully and become Ready. 6 the node should become NotReady due to not satisfying the minimum kubelet version for user namespace, but not due to error: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
Additional info:
In step 5, related logs : machine-config-operator-847dd9cb6b-9q8gc: I0902 05:42:59.215353 1 simple_featuregate_reader.go:171] Starting feature-gate-detector I0902 05:42:59.305303 1 start.go:129] FeatureGates initialized: knownFeatureGates=[AWSEFSDriverVolumeMetrics AdditionalRoutingCapabilities AdminNetworkPolicy AlibabaPlatform AutomatedEtcdBackup AzureWorkloadIdentity BareMetalLo... I0902 05:42:59.338377 1 operator.go:372] On-cluster layering featuregate enabled, starting MachineOSConfig informer W0902 05:42:59.340834 1 reflector.go:547] github.com/openshift/machine-config-operator/pkg/operator/operator.go:380: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io) E0902 05:42:59.340935 1 reflector.go:150] github.com/openshift/machine-config-operator/pkg/operator/operator.go:380: Failed to watch *v1alpha1.MachineOSConfig: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io) I0902 05:42:59.764155 1 operator.go:419] Change observed to kube-apiserver-server-ca W0902 05:43:00.787205 1 reflector.go:547] github.com/openshift/machine-config-operator/pkg/operator/operator.go:380: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io) E0902 05:43:00.787319 1 reflector.go:150] github.com/openshift/machine-config-operator/pkg/operator/operator.go:380: Failed to watch *v1alpha1.MachineOSConfig: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io) machine-config-controller-6d9c49df78-4fhks: I0902 05:42:49.909236 1 reflector.go:359] Caches populated for *v1.ClusterVersion from github.com/openshift/client-go/config/informers/externalversions/factory.go:125 I0902 05:42:49.910504 1 reflector.go:359] Caches populated for *v1.Secret from k8s.io/client-go/informers/factory.go:160 I0902 05:42:49.910696 1 template_controller.go:144] Re-syncing ControllerConfig due to secret pull-secret change W0902 05:42:49.913262 1 reflector.go:547] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io) E0902 05:42:49.913310 1 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.PinnedImageSet: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io) I0902 05:42:49.914318 1 reflector.go:359] Caches populated for *v1.MachineConfigPool from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125 I0902 05:42:49.917677 1 reflector.go:359] Caches populated for *v1.Node from k8s.io/client-go/informers/factory.go:160 E0902 05:42:49.917874 1 node_controller.go:505] getting scheduler config failed: cluster scheduler couldn't be found E0902 05:42:49.917994 1 node_controller.go:505] getting scheduler config failed: cluster scheduler couldn't be found E0902 05:42:49.918040 1 node_controller.go:505] getting scheduler config failed: cluster scheduler couldn't be found ... I0902 05:42:49.926930 1 template_controller.go:196] Re-syncing ControllerConfig due to apiServer cluster change W0902 05:42:49.934372 1 reflector.go:547] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: failed to list *v1alpha1.MachineOSBuild: the server could not find the requested resource (get machineosbuilds.machineconfiguration.openshift.io) E0902 05:42:49.934413 1 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.MachineOSBuild: failed to list *v1alpha1.MachineOSBuild: the server could not find the requested resource (get machineosbuilds.machineconfiguration.openshift.io) machine-config-daemon-lwsqv for master-0: I0902 05:52:44.154690 4155 start.go:219] FeatureGates initialized: knownFeatureGates=[AWSEFSDriverVolumeMetrics AdditionalRoutingCapabilities ...ableMachineHealthCheckController MultiArchInstallAzure] I0902 05:52:44.154846 4155 start.go:221] Feature enabled: PinnedImages I0902 05:52:44.154761 4155 event.go:377] Event(v1.ObjectReference{Kind:"Node", Namespace:"openshift-machine-config-operator", Name:"minmli-090241601-tfqpl-master-0", UID:"5a38d361-91a5-4de6-a522-c14ebcd7f221", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'FeatureGatesInitialized' FeatureGates updated to featuregates.Features{Enabled:[]v1.FeatureGateName{"AWSEFSDriverVolumeMetrics", "AdditionalRoutingCapabilities", ..."MachineAPIMigration", "MachineAPIOperatorDisableMachineHealthCheckController", "MultiArchInstallAzure"}} I0902 05:52:44.155700 4155 update.go:2692] "Starting to manage node: minmli-090241601-tfqpl-master-0" W0902 05:52:44.158084 4155 reflector.go:547] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io) E0902 05:52:44.158123 4155 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.PinnedImageSet: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io) machine-config-daemon-lkflm for worker: I0902 05:47:09.854802 2591 simple_featuregate_reader.go:171] Starting feature-gate-detector I0902 05:47:09.857207 2591 writer.go:87] NodeWriter initialized with credentials from /var/lib/kubelet/kubeconfig I0902 05:47:09.864543 2591 start.go:219] FeatureGates initialized: knownFeatureGates=[AWSEFSDriverVolumeMetrics AdditionalRoutingCapabilities AdminNetworkPolicy AlibabaPlatform... MachineAPIOperatorDisableMachineHealthCheckController MultiArchInstallAzure] I0902 05:47:09.864615 2591 start.go:221] Feature enabled: PinnedImages I0902 05:47:09.864755 2591 event.go:377] Event(v1.ObjectReference{Kind:"Node", Namespace:"openshift-machine-config-operator", Name:"minmli-090241601-tfqpl-master-0", UID:"5a38d361-91a5-4de6-a522-c14ebcd7f221", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'FeatureGatesInitialized' FeatureGates updated to featuregates.Features{Enabled:[]v1.FeatureGateName{"AWSEFSDriverVolumeMetrics", "AdditionalRoutingCapabilities", ..."MultiArchInstallAzure"}} I0902 05:47:09.865088 2591 update.go:2692] "Starting to manage node: minmli-090241601-tfqpl-worker-a-hms4k" I0902 05:47:09.871329 2591 image_manager_helper.go:92] Running captured: rpm-ostree status W0902 05:47:09.871429 2591 reflector.go:547] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io) E0902 05:47:09.871494 2591 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.PinnedImageSet: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io) I0902 05:47:09.913007 2591 daemon.go:1759] State: idle