Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: 4.17.z
Affects Version/s: 4.17
Component/s: Machine Config Operator
Labels:
- machine-config-operator
- mco-triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
3
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.17.z
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
Done
Release Note Type:
Release Note Not Required
Release Note Text:
N/A

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    Upgrade a cluster from 4.16 to 4.17 with one mcp paused, then enable TechPreview for featureset, the master and worker stuck in NotReady and mco report error: 
failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io)

Version-Release number of selected component (if applicable):

    upgrade from 4.16.0-0.nightly-2025-09-01-105533 to 4.17.0-0.nightly-2025-08-26-172650

How reproducible:

    always

Steps to Reproduce:

    1.launch a 4.16 cluster
    2.create a infra mcp and add one node into this mcp
% oc label node minmli-41901-qdd5b-worker-a-n7996 node-role.kubernetes.io/infra=

infra_mcp.yaml:
apiVersion: machineconfiguration.openshift.io/v1
 kind: MachineConfigPool
 metadata:
   name: infra
 spec:
   machineConfigSelector:
     matchExpressions:
       - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
   nodeSelector:
     matchLabels:
       node-role.kubernetes.io/infra: ""

    3.wait several minutes until the mcp infra finish updating, then edit the mcp infra to set paused: true   
    4.upgrade the cluster from 4.16 to 4.17
    5.When the upgrade process finished, edit the featuregate cluster to enable TechPreview:
spec:
  featureSet: "TechPreviewNoUpgrade"

    6.edit the mcp infra to set paused: false, then the node of mcp infra start upgrading, but finally it stuck in NotReady.

Actual results:

    5. The master and worker sutck in NotReady
# oc get node 
NAME                                    STATUS     ROLES                  AGE     VERSION
minmli-090241601-tfqpl-master-0         NotReady   control-plane,master   4h26m   v1.30.14
minmli-090241601-tfqpl-master-1         Ready      control-plane,master   4h26m   v1.30.14
minmli-090241601-tfqpl-master-2         Ready      control-plane,master   4h26m   v1.30.14
minmli-090241601-tfqpl-worker-a-hms4k   NotReady   worker                 4h14m   v1.30.14
minmli-090241601-tfqpl-worker-b-n9q7h   Ready      worker                 4h14m   v1.30.14
minmli-090241601-tfqpl-worker-c-s924n   Ready      infra,worker           4h14m   v1.29.14+c68a663

    6. The node of infra mcp stuck in NotReady:
# oc describe node minmli-090241601-tfqpl-worker-c-s924n
Name:               minmli-090241601-tfqpl-worker-c-s924n Roles:              infra,worker ....
Conditions:   Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message   ----                 ------  -----------------                 ------------------                ------                       -------   NetworkUnavailable   False   Mon, 01 Sep 2025 23:00:52 -0400   Mon, 01 Sep 2025 23:01:15 -0400   RouteCreated                 ovn-kube cleared kubelet-set NoRouteCreated   MemoryPressure       False   Tue, 02 Sep 2025 03:24:08 -0400   Tue, 02 Sep 2025 03:23:57 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available   DiskPressure         False   Tue, 02 Sep 2025 03:24:08 -0400   Tue, 02 Sep 2025 03:23:57 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure   PIDPressure          False   Tue, 02 Sep 2025 03:24:08 -0400   Tue, 02 Sep 2025 03:23:57 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available   Ready                False   Tue, 02 Sep 2025 03:24:08 -0400   Tue, 02 Sep 2025 03:23:57 -0400   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
...
Events:   Type     Reason                   Age                            From                   Message   ----     ------                   ----                           ----                   -------   Normal   Synced                   3h48m                          cloud-node-controller  Node synced successfully   Normal   RegisteredNode           3h48m                          node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   RegisteredNode           3h44m                          node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   RegisteredNode           3h41m                          node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   RegisteredNode           167m                           node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   RegisteredNode           164m                           node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Warning  ErrorAddingResource      145m                           controlplane           macAddress annotation not found for node minmli-090241601-tfqpl-worker-c-s924n; error: could not find "k8s.ovn.org/node-mgmt-port-mac-addresses" annotation   Normal   RegisteredNode           135m                           node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   RegisteredNode           122m                           node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   RegisteredNode           61m                            node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   NodeNotSchedulable       <invalid>                      kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeNotSchedulable   Normal   OSUpdateStaged           <invalid>                      machineconfigdaemon    Changes to OS staged   Normal   NodeNotReady             <invalid>                      node-controller        Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeNotReady   Normal   Starting                 <invalid>                      kubelet                Starting kubelet.   Normal   NodeAllocatableEnforced  <invalid>                      kubelet                Updated Node Allocatable limit across pods   Normal   NodeHasSufficientMemory  <invalid> (x2 over <invalid>)  kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeHasSufficientMemory   Normal   NodeHasNoDiskPressure    <invalid> (x2 over <invalid>)  kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeHasNoDiskPressure   Normal   NodeHasSufficientPID     <invalid> (x2 over <invalid>)  kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeHasSufficientPID   Warning  Rebooted                 <invalid>                      kubelet                Node minmli-090241601-tfqpl-worker-c-s924n has been rebooted, boot id: 9fcbe0fb-cb6f-474e-ac9e-7f6e95045d8c   Normal   NodeNotReady             <invalid>                      kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeNotReady   Normal   NodeNotSchedulable       <invalid>                      kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeNotSchedulable   Normal   NodeSchedulable          <invalid>                      kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeSchedulable

Expected results:

5 the master and worker can update successfully and become Ready.    
6 the node should become NotReady due to not satisfying the minimum kubelet version for user namespace, but not due to error: 
container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?

Additional info:

In step 5, related logs : 

machine-config-operator-847dd9cb6b-9q8gc:
I0902 05:42:59.215353       1 simple_featuregate_reader.go:171] Starting feature-gate-detector
I0902 05:42:59.305303       1 start.go:129] FeatureGates initialized: knownFeatureGates=[AWSEFSDriverVolumeMetrics AdditionalRoutingCapabilities AdminNetworkPolicy AlibabaPlatform AutomatedEtcdBackup AzureWorkloadIdentity BareMetalLo...
I0902 05:42:59.338377       1 operator.go:372] On-cluster layering featuregate enabled, starting MachineOSConfig informer
W0902 05:42:59.340834       1 reflector.go:547] github.com/openshift/machine-config-operator/pkg/operator/operator.go:380: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io)
E0902 05:42:59.340935       1 reflector.go:150] github.com/openshift/machine-config-operator/pkg/operator/operator.go:380: Failed to watch *v1alpha1.MachineOSConfig: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io)
I0902 05:42:59.764155       1 operator.go:419] Change observed to kube-apiserver-server-ca
W0902 05:43:00.787205       1 reflector.go:547] github.com/openshift/machine-config-operator/pkg/operator/operator.go:380: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io)
E0902 05:43:00.787319       1 reflector.go:150] github.com/openshift/machine-config-operator/pkg/operator/operator.go:380: Failed to watch *v1alpha1.MachineOSConfig: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io)

machine-config-controller-6d9c49df78-4fhks:
I0902 05:42:49.909236       1 reflector.go:359] Caches populated for *v1.ClusterVersion from github.com/openshift/client-go/config/informers/externalversions/factory.go:125
I0902 05:42:49.910504       1 reflector.go:359] Caches populated for *v1.Secret from k8s.io/client-go/informers/factory.go:160
I0902 05:42:49.910696       1 template_controller.go:144] Re-syncing ControllerConfig due to secret pull-secret change
W0902 05:42:49.913262       1 reflector.go:547] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io)
E0902 05:42:49.913310       1 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.PinnedImageSet: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io)
I0902 05:42:49.914318       1 reflector.go:359] Caches populated for *v1.MachineConfigPool from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
I0902 05:42:49.917677       1 reflector.go:359] Caches populated for *v1.Node from k8s.io/client-go/informers/factory.go:160
E0902 05:42:49.917874       1 node_controller.go:505] getting scheduler config failed: cluster scheduler couldn't be found
E0902 05:42:49.917994       1 node_controller.go:505] getting scheduler config failed: cluster scheduler couldn't be found
E0902 05:42:49.918040       1 node_controller.go:505] getting scheduler config failed: cluster scheduler couldn't be found
...
I0902 05:42:49.926930       1 template_controller.go:196] Re-syncing ControllerConfig due to apiServer cluster change
W0902 05:42:49.934372       1 reflector.go:547] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: failed to list *v1alpha1.MachineOSBuild: the server could not find the requested resource (get machineosbuilds.machineconfiguration.openshift.io)
E0902 05:42:49.934413       1 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.MachineOSBuild: failed to list *v1alpha1.MachineOSBuild: the server could not find the requested resource (get machineosbuilds.machineconfiguration.openshift.io)

machine-config-daemon-lwsqv for master-0:
I0902 05:52:44.154690    4155 start.go:219] FeatureGates initialized: knownFeatureGates=[AWSEFSDriverVolumeMetrics AdditionalRoutingCapabilities ...ableMachineHealthCheckController MultiArchInstallAzure]
I0902 05:52:44.154846    4155 start.go:221] Feature enabled: PinnedImages
I0902 05:52:44.154761    4155 event.go:377] Event(v1.ObjectReference{Kind:"Node", Namespace:"openshift-machine-config-operator", Name:"minmli-090241601-tfqpl-master-0", UID:"5a38d361-91a5-4de6-a522-c14ebcd7f221", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'FeatureGatesInitialized' FeatureGates updated to featuregates.Features{Enabled:[]v1.FeatureGateName{"AWSEFSDriverVolumeMetrics", "AdditionalRoutingCapabilities", ..."MachineAPIMigration", "MachineAPIOperatorDisableMachineHealthCheckController", "MultiArchInstallAzure"}}
I0902 05:52:44.155700    4155 update.go:2692] "Starting to manage node: minmli-090241601-tfqpl-master-0"
W0902 05:52:44.158084    4155 reflector.go:547] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io)
E0902 05:52:44.158123    4155 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.PinnedImageSet: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io)
  
machine-config-daemon-lkflm for worker:
I0902 05:47:09.854802    2591 simple_featuregate_reader.go:171] Starting feature-gate-detector
I0902 05:47:09.857207    2591 writer.go:87] NodeWriter initialized with credentials from /var/lib/kubelet/kubeconfig
I0902 05:47:09.864543    2591 start.go:219] FeatureGates initialized: knownFeatureGates=[AWSEFSDriverVolumeMetrics AdditionalRoutingCapabilities AdminNetworkPolicy AlibabaPlatform...
MachineAPIOperatorDisableMachineHealthCheckController MultiArchInstallAzure]
I0902 05:47:09.864615    2591 start.go:221] Feature enabled: PinnedImages
I0902 05:47:09.864755    2591 event.go:377] Event(v1.ObjectReference{Kind:"Node", Namespace:"openshift-machine-config-operator", Name:"minmli-090241601-tfqpl-master-0", UID:"5a38d361-91a5-4de6-a522-c14ebcd7f221", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'FeatureGatesInitialized' FeatureGates updated to featuregates.Features{Enabled:[]v1.FeatureGateName{"AWSEFSDriverVolumeMetrics", "AdditionalRoutingCapabilities", ..."MultiArchInstallAzure"}}
I0902 05:47:09.865088    2591 update.go:2692] "Starting to manage node: minmli-090241601-tfqpl-worker-a-hms4k"
I0902 05:47:09.871329    2591 image_manager_helper.go:92] Running captured: rpm-ostree status
W0902 05:47:09.871429    2591 reflector.go:547] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io)
E0902 05:47:09.871494    2591 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.PinnedImageSet: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io)
I0902 05:47:09.913007    2591 daemon.go:1759] State: idle

depends on

OCPBUGS-77073 MCO report "failed to list *v1alpha1.MachineOSConfig" when enable TechPreview after upgrading from 4.16 to 4.17 with one mcp paused

Closed

links to

openshift/machine-config-operator#5584: [release-4.17] OCPBUGS-61190: MCO report "failed to list *v1alpha1.MachineOSConfig" when enable TechPreview after upgrading from 4.16 to 4.17 with one mcp paused

Assignee:: Dalia Khater

Reporter:: Min Li

QA Contact:: Sergio Regidor de la Rosa

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/09/03 10:26 AM

Updated:: 2026/02/26 2:47 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates