Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61190

MCO report "failed to list *v1alpha1.MachineOSConfig" when enable TechPreview after upgrading from 4.16 to 4.17 with one mcp paused

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          Upgrade a cluster from 4.16 to 4.17 with one mcp paused, then enable TechPreview for featureset, the master and worker stuck in NotReady and mco report error: 
      failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io)

      Version-Release number of selected component (if applicable):

          upgrade from 4.16.0-0.nightly-2025-09-01-105533 to 4.17.0-0.nightly-2025-08-26-172650

      How reproducible:

          always 

      Steps to Reproduce:

          1.launch a 4.16 cluster
          2.create a infra mcp and add one node into this mcp
      % oc label node minmli-41901-qdd5b-worker-a-n7996 node-role.kubernetes.io/infra=
      
      infra_mcp.yaml:
      apiVersion: machineconfiguration.openshift.io/v1
       kind: MachineConfigPool
       metadata:
         name: infra
       spec:
         machineConfigSelector:
           matchExpressions:
             - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
         nodeSelector:
           matchLabels:
             node-role.kubernetes.io/infra: ""
      
          3.wait several minutes until the mcp infra finish updating, then edit the mcp infra to set paused: true   
          4.upgrade the cluster from 4.16 to 4.17
          5.When the upgrade process finished, edit the featuregate cluster to enable TechPreview:
      spec:
        featureSet: "TechPreviewNoUpgrade"
      
          6.edit the mcp infra to set paused: false, then the node of mcp infra start upgrading, but finally it stuck in NotReady.
      
      
      

      Actual results:

          5. The master and worker sutck in NotReady
      # oc get node 
      NAME                                    STATUS     ROLES                  AGE     VERSION
      minmli-090241601-tfqpl-master-0         NotReady   control-plane,master   4h26m   v1.30.14
      minmli-090241601-tfqpl-master-1         Ready      control-plane,master   4h26m   v1.30.14
      minmli-090241601-tfqpl-master-2         Ready      control-plane,master   4h26m   v1.30.14
      minmli-090241601-tfqpl-worker-a-hms4k   NotReady   worker                 4h14m   v1.30.14
      minmli-090241601-tfqpl-worker-b-n9q7h   Ready      worker                 4h14m   v1.30.14
      minmli-090241601-tfqpl-worker-c-s924n   Ready      infra,worker           4h14m   v1.29.14+c68a663
      
          6. The node of infra mcp stuck in NotReady:
      # oc describe node minmli-090241601-tfqpl-worker-c-s924n
      Name:               minmli-090241601-tfqpl-worker-c-s924n Roles:              infra,worker ....
      Conditions:   Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message   ----                 ------  -----------------                 ------------------                ------                       -------   NetworkUnavailable   False   Mon, 01 Sep 2025 23:00:52 -0400   Mon, 01 Sep 2025 23:01:15 -0400   RouteCreated                 ovn-kube cleared kubelet-set NoRouteCreated   MemoryPressure       False   Tue, 02 Sep 2025 03:24:08 -0400   Tue, 02 Sep 2025 03:23:57 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available   DiskPressure         False   Tue, 02 Sep 2025 03:24:08 -0400   Tue, 02 Sep 2025 03:23:57 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure   PIDPressure          False   Tue, 02 Sep 2025 03:24:08 -0400   Tue, 02 Sep 2025 03:23:57 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available   Ready                False   Tue, 02 Sep 2025 03:24:08 -0400   Tue, 02 Sep 2025 03:23:57 -0400   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
      ...
      Events:   Type     Reason                   Age                            From                   Message   ----     ------                   ----                           ----                   -------   Normal   Synced                   3h48m                          cloud-node-controller  Node synced successfully   Normal   RegisteredNode           3h48m                          node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   RegisteredNode           3h44m                          node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   RegisteredNode           3h41m                          node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   RegisteredNode           167m                           node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   RegisteredNode           164m                           node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Warning  ErrorAddingResource      145m                           controlplane           macAddress annotation not found for node minmli-090241601-tfqpl-worker-c-s924n; error: could not find "k8s.ovn.org/node-mgmt-port-mac-addresses" annotation   Normal   RegisteredNode           135m                           node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   RegisteredNode           122m                           node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   RegisteredNode           61m                            node-controller        Node minmli-090241601-tfqpl-worker-c-s924n event: Registered Node minmli-090241601-tfqpl-worker-c-s924n in Controller   Normal   NodeNotSchedulable       <invalid>                      kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeNotSchedulable   Normal   OSUpdateStaged           <invalid>                      machineconfigdaemon    Changes to OS staged   Normal   NodeNotReady             <invalid>                      node-controller        Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeNotReady   Normal   Starting                 <invalid>                      kubelet                Starting kubelet.   Normal   NodeAllocatableEnforced  <invalid>                      kubelet                Updated Node Allocatable limit across pods   Normal   NodeHasSufficientMemory  <invalid> (x2 over <invalid>)  kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeHasSufficientMemory   Normal   NodeHasNoDiskPressure    <invalid> (x2 over <invalid>)  kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeHasNoDiskPressure   Normal   NodeHasSufficientPID     <invalid> (x2 over <invalid>)  kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeHasSufficientPID   Warning  Rebooted                 <invalid>                      kubelet                Node minmli-090241601-tfqpl-worker-c-s924n has been rebooted, boot id: 9fcbe0fb-cb6f-474e-ac9e-7f6e95045d8c   Normal   NodeNotReady             <invalid>                      kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeNotReady   Normal   NodeNotSchedulable       <invalid>                      kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeNotSchedulable   Normal   NodeSchedulable          <invalid>                      kubelet                Node minmli-090241601-tfqpl-worker-c-s924n status is now: NodeSchedulable 

      Expected results:

      5 the master and worker can update successfully and become Ready.    
      6 the node should become NotReady due to not satisfying the minimum kubelet version for user namespace, but not due to error: 
      container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
      
      

      Additional info:

      In step 5, related logs : 
      
      machine-config-operator-847dd9cb6b-9q8gc:
      I0902 05:42:59.215353       1 simple_featuregate_reader.go:171] Starting feature-gate-detector
      I0902 05:42:59.305303       1 start.go:129] FeatureGates initialized: knownFeatureGates=[AWSEFSDriverVolumeMetrics AdditionalRoutingCapabilities AdminNetworkPolicy AlibabaPlatform AutomatedEtcdBackup AzureWorkloadIdentity BareMetalLo...
      I0902 05:42:59.338377       1 operator.go:372] On-cluster layering featuregate enabled, starting MachineOSConfig informer
      W0902 05:42:59.340834       1 reflector.go:547] github.com/openshift/machine-config-operator/pkg/operator/operator.go:380: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io)
      E0902 05:42:59.340935       1 reflector.go:150] github.com/openshift/machine-config-operator/pkg/operator/operator.go:380: Failed to watch *v1alpha1.MachineOSConfig: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io)
      I0902 05:42:59.764155       1 operator.go:419] Change observed to kube-apiserver-server-ca
      W0902 05:43:00.787205       1 reflector.go:547] github.com/openshift/machine-config-operator/pkg/operator/operator.go:380: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io)
      E0902 05:43:00.787319       1 reflector.go:150] github.com/openshift/machine-config-operator/pkg/operator/operator.go:380: Failed to watch *v1alpha1.MachineOSConfig: failed to list *v1alpha1.MachineOSConfig: the server could not find the requested resource (get machineosconfigs.machineconfiguration.openshift.io)
      
      machine-config-controller-6d9c49df78-4fhks:
      I0902 05:42:49.909236       1 reflector.go:359] Caches populated for *v1.ClusterVersion from github.com/openshift/client-go/config/informers/externalversions/factory.go:125
      I0902 05:42:49.910504       1 reflector.go:359] Caches populated for *v1.Secret from k8s.io/client-go/informers/factory.go:160
      I0902 05:42:49.910696       1 template_controller.go:144] Re-syncing ControllerConfig due to secret pull-secret change
      W0902 05:42:49.913262       1 reflector.go:547] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io)
      E0902 05:42:49.913310       1 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.PinnedImageSet: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io)
      I0902 05:42:49.914318       1 reflector.go:359] Caches populated for *v1.MachineConfigPool from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
      I0902 05:42:49.917677       1 reflector.go:359] Caches populated for *v1.Node from k8s.io/client-go/informers/factory.go:160
      E0902 05:42:49.917874       1 node_controller.go:505] getting scheduler config failed: cluster scheduler couldn't be found
      E0902 05:42:49.917994       1 node_controller.go:505] getting scheduler config failed: cluster scheduler couldn't be found
      E0902 05:42:49.918040       1 node_controller.go:505] getting scheduler config failed: cluster scheduler couldn't be found
      ...
      I0902 05:42:49.926930       1 template_controller.go:196] Re-syncing ControllerConfig due to apiServer cluster change
      W0902 05:42:49.934372       1 reflector.go:547] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: failed to list *v1alpha1.MachineOSBuild: the server could not find the requested resource (get machineosbuilds.machineconfiguration.openshift.io)
      E0902 05:42:49.934413       1 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.MachineOSBuild: failed to list *v1alpha1.MachineOSBuild: the server could not find the requested resource (get machineosbuilds.machineconfiguration.openshift.io)
      
      machine-config-daemon-lwsqv for master-0:
      I0902 05:52:44.154690    4155 start.go:219] FeatureGates initialized: knownFeatureGates=[AWSEFSDriverVolumeMetrics AdditionalRoutingCapabilities ...ableMachineHealthCheckController MultiArchInstallAzure]
      I0902 05:52:44.154846    4155 start.go:221] Feature enabled: PinnedImages
      I0902 05:52:44.154761    4155 event.go:377] Event(v1.ObjectReference{Kind:"Node", Namespace:"openshift-machine-config-operator", Name:"minmli-090241601-tfqpl-master-0", UID:"5a38d361-91a5-4de6-a522-c14ebcd7f221", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'FeatureGatesInitialized' FeatureGates updated to featuregates.Features{Enabled:[]v1.FeatureGateName{"AWSEFSDriverVolumeMetrics", "AdditionalRoutingCapabilities", ..."MachineAPIMigration", "MachineAPIOperatorDisableMachineHealthCheckController", "MultiArchInstallAzure"}}
      I0902 05:52:44.155700    4155 update.go:2692] "Starting to manage node: minmli-090241601-tfqpl-master-0"
      W0902 05:52:44.158084    4155 reflector.go:547] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io)
      E0902 05:52:44.158123    4155 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.PinnedImageSet: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io)
        
      machine-config-daemon-lkflm for worker:
      I0902 05:47:09.854802    2591 simple_featuregate_reader.go:171] Starting feature-gate-detector
      I0902 05:47:09.857207    2591 writer.go:87] NodeWriter initialized with credentials from /var/lib/kubelet/kubeconfig
      I0902 05:47:09.864543    2591 start.go:219] FeatureGates initialized: knownFeatureGates=[AWSEFSDriverVolumeMetrics AdditionalRoutingCapabilities AdminNetworkPolicy AlibabaPlatform...
      MachineAPIOperatorDisableMachineHealthCheckController MultiArchInstallAzure]
      I0902 05:47:09.864615    2591 start.go:221] Feature enabled: PinnedImages
      I0902 05:47:09.864755    2591 event.go:377] Event(v1.ObjectReference{Kind:"Node", Namespace:"openshift-machine-config-operator", Name:"minmli-090241601-tfqpl-master-0", UID:"5a38d361-91a5-4de6-a522-c14ebcd7f221", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'FeatureGatesInitialized' FeatureGates updated to featuregates.Features{Enabled:[]v1.FeatureGateName{"AWSEFSDriverVolumeMetrics", "AdditionalRoutingCapabilities", ..."MultiArchInstallAzure"}}
      I0902 05:47:09.865088    2591 update.go:2692] "Starting to manage node: minmli-090241601-tfqpl-worker-a-hms4k"
      I0902 05:47:09.871329    2591 image_manager_helper.go:92] Running captured: rpm-ostree status
      W0902 05:47:09.871429    2591 reflector.go:547] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io)
      E0902 05:47:09.871494    2591 reflector.go:150] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: Failed to watch *v1alpha1.PinnedImageSet: failed to list *v1alpha1.PinnedImageSet: the server could not find the requested resource (get pinnedimagesets.machineconfiguration.openshift.io)
      I0902 05:47:09.913007    2591 daemon.go:1759] State: idle

              team-mco Team MCO
              rhn-support-minmli Min Li
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: