Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-62619

Master MCP stuck into degraded state due to 'Failed to render configuration for pool master: etcdserver: request is too large'

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Machine-config operator stuck into degraded state due to degraded MCP as the newly generated rendered machineconfig file is greater than 1.5 MB which is the limit for the etcd.
      
      From the logs we could see that the nodes were upgraded to this latest rendered Config. However the MCP stuck into degraded state with the below errors
      ~~~
        message: 'Failed to render configuration for pool master: etcdserver: request is too large'
        reason: ""
        status: "True"
        type: RenderDegraded
      ~~~
      
      Also the machine-config-controller logs are full of failures etcdserver: request is too large
      ~~~
      2025-09-29T15:11:54.420173338Z E0929 15:11:54.420119       1 render_controller.go:460] Error syncing Generated MCFG: %!w(*errors.StatusError=&{{{ } {   <nil>} Failure etcdserver: request is too large  <nil> 500}})
      2025-09-29T15:11:54.427327865Z E0929 15:11:54.427288       1 render_controller.go:396] etcdserver: request is too large
      2025-09-29T15:11:54.427327865Z I0929 15:11:54.427311       1 render_controller.go:397] Dropping machineconfigpool "master" out of the queue: etcdserver: request is too large
      [..]
      2025-09-30T10:31:29.086682700Z I0930 10:31:29.086599       1 render_controller.go:391] Error syncing machineconfigpool master: etcdserver: request is too large
      2025-09-30T10:31:39.917029591Z E0930 10:31:39.916935       1 render_controller.go:460] Error syncing Generated MCFG: %!w(*errors.StatusError=&{{{ } {   <nil>} Failure etcdserver: request is too large  <nil> 500}})
      2025-09-30T10:31:39.924226014Z I0930 10:31:39.924182       1 render_controller.go:391] Error syncing machineconfigpool master: etcdserver: request is too large
      ~~~
      
      Observed that the  size of the latest rendered machineconfig and the respective machine-config we found that the machine-configs are big in size (1556422 bytes)
      
      checking the contents of the inside the machine-config, confirms that the size of the registries ("/etc/containers/registries.conf") file generated by all the mirrors setup is HUGE.
      ~~~
      $ omc get mc 99-master-generated-registries -o json | jq -r .spec.config.storage.files[0].contents.source | cut -d',' -f2 | base64 -d | less
      ~~~
      
      which interns make the rendered macineconfig huge and impacting the mcp to be stuck into degraded state. 
      
      For now we couldn't think of any workaround rather than reducing the size of content of MCP. As this MC is generated by controller itself its difficult for the customer to delete something from there. Looking forward for workaround and permanent fix for this issue.
      
      In my opinion there should be some checks for the renreded machine config to not exceeding 1.5MB limit in etcd db. Open to discuss this further to see other alternates.
      
          

      Version-Release number of selected component (if applicable):

      
          

      How reproducible:

      
          

      Steps to Reproduce:

          1.  cluster with lots of machinconfigs including large size ImageDigestMirrorSet object, combined rendered machineconfig file is larger than 1.5MB in size.
          2.  upgrade the cluster, check the new rendered machineconfig file is greater than 1.5 MB
          3. check the machine-config-operator stuck into degraded due to 'Failed to render configuration for pool master: etcdserver: request  is too large'
          

      Actual results:

           MachineConfigOperator stuck into degraded state due to degraded  MachineConfigPool/MCP because of huge size of the new rendered machineconfig generated during cluster upgrade.
          

      Expected results:

          Cluster should avoid such blockers during upgrade. Ideally there should be a check for the size of new rendered generated , to ensure that it should not exceed the default limit of 1.5 MiB at etcd end.
          

      Additional info:

           This happens during cluster upgrade from OCP-4.14 to 4.16 but can happen during any cluster upgrade or any machine-config update.
          

              team-mco Team MCO
              rhn-support-nkashyap Nirupma Nirupma
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: