Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-62470

3000 VLAN NNCPs fail, "etcdserver: request is too large" in NNS

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.18
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • No
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      NodeNetworkConfigurationPolicy (NNCP) status field is empty when managing large numbers of VLANs (1000+). The handler pods report etcdserver: request is too large errors preventing NodeNetworkState updates.

      When applying a NodeNetworkConfigurationPolicy to delete 50 VLANs on a cluster that already has 1500+ VLANs configured, the NNCP object shows empty STATUS and REASON fields.

      $ oc get nncp
      NAME                              STATUS   REASON
      delete-50-vlans-batch-1000-1049
      

      The STATUS and REASON columns are empty (not "Progressing", not "Available", completely empty).

      Version-Release number of selected component (if applicable):

      4.18.0-0.nightly-2025-09-25-164655
      OS_GIT_VERSION=4.18.0-202509101149.p2.g53c5d9a.assembly.stream.el9-53c5d9a
      SOURCE_GIT_TREE_STATE=clean
      OS_GIT_COMMIT=53c5d9a
      SOURCE_GIT_COMMIT=53c5d9ac6f9f10b45e8a31691b03ce3a4f86bad2
      SOURCE_GIT_TAG=v0.17.0-2608-g53c5d9ac6
      SOURCE_GIT_URL=https://github.com/openshift/kubernetes-nmstate
      

      How reproducible:
      Once

      Steps to Reproduce:

      1. Configure cluster with 1500+ VLANs using multiple NNCPs
      2. Apply NNCP to delete 50 VLANs (delete-50-vlans-batch-1000-1049)
        Name:         delete-50-vlans-batch-1000-1049
        Namespace:
        Labels:       <none>
        Annotations:  nmstate.io/webhook-mutating-timestamp: 1759166377092333775
        API Version:  nmstate.io/v1
        Kind:         NodeNetworkConfigurationPolicy
        Metadata:
          Creation Timestamp:  2025-09-29T17:19:37Z
          Generation:          1
          Resource Version:    1013759
          UID:                 79ea7108-2b05-40a7-aa6b-547c976492ff
        Spec:
          Desired State:
            Interfaces:
              Name:   eno1.1000
              State:  absent
              Type:   vlan
              Name:   eno1.1001
              State:  absent
              Type:   vlan
              [... 48 more VLANs from eno1.1002 through eno1.1049 ...]
          Node Selector:
            node-role.kubernetes.io/worker:
        Events:                              <none>
        
      3. Observe that NNCP creation timestamp is 2025-09-29T17:19:37Z
      4. Check status at 2025-09-29T17:24:42Z (5 minutes later) - empty
      5. Check status at 2025-09-29T17:34:57Z (15 minutes later) - still empty

      Actual results:

      nmstate Warning

      [2025-09-29T14:34:10Z WARN  nmstate::query_apply::net_state] Interfaces count exceeds the support limit 1000 in desired state
      

      Source: https://github.com/nmstate/nmstate/blob/base/rust/src/lib/query_apply/net_state.rs#L32
      Constant: MAX_SUPPORTED_INTERFACES = 1000

      etcd Request Size Error
      Handler pod nmstate-handler-dg46c on node master-0 logs 7 occurrences of this error:

      2025-09-29T17:51:15.078Z {"level":"error","ts":"2025-09-29T17:51:15.078Z","msg":"Reconciler error","controller":"NodeNetworkState","object":{"name":"master-0"},"namespace":"","name":"master-0","reconcileID":"7bb31f59-50de-43e2-b8ef-a4675cbcd345","error":"error at node reconcile creating NodeNetworkState: Error updating nodeNetworkState: etcdserver: request is too large"}
      

      Stack trace:

      etcdserver: request is too large
      Error updating nodeNetworkState
      github.com/nmstate/kubernetes-nmstate/pkg/client.UpdateCurrentState
      	/go/src/github.com/openshift/kubernetes-nmstate/pkg/client/client.go:118
      github.com/nmstate/kubernetes-nmstate/pkg/client.CreateOrUpdateNodeNetworkState
      	/go/src/github.com/openshift/kubernetes-nmstate/pkg/client/client.go:93
      github.com/nmstate/kubernetes-nmstate/controllers/handler.(*NodeReconciler).Reconcile
      	/go/src/github.com/openshift/kubernetes-nmstate/controllers/handler/node_controller.go:112
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
      	/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
      	/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
      	/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
      	/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235
      runtime.goexit
      	/usr/lib/golang/src/runtime/asm_amd64.s:1695
      error at node reconcile creating NodeNetworkState
      github.com/nmstate/kubernetes-nmstate/controllers/handler.(*NodeReconciler).Reconcile
      	/go/src/github.com/openshift/kubernetes-nmstate/controllers/handler/node_controller.go:114
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
      	/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
      	/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
      	/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
      	/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235
      runtime.goexit
      	/usr/lib/golang/src/runtime/asm_amd64.s:1695
      
      
      

      Expected Results:

      Add/delete 300 VLANs should work.

      NNCP should always show status conditions

      NAME                              STATUS        REASON
      delete-50-vlans-batch-1000-1049   Progressing   ConfigurationProgressing
      

      Additional info:

      VLAN NNCP

      
      apiVersion: nmstate.io/v1
      kind: NodeNetworkConfigurationPolicy
      metadata:
        name: create-500-vlans-part-1000
      spec:
        nodeSelector:
          node-role.kubernetes.io/worker: ''
        desiredState:
          interfaces:
          - name: eno1.1000
            type: vlan
            state: up
            mtu: 1400
            mac-address: 02:00:00:03:e8:00
            ipv4:
              enabled: true
              dhcp: true
              address:
              - ip: 192.0.2.100
                prefix-length: 24
            vlan:
              id: 1000
              base-iface: eno1
          - name: eno1.1001
            type: vlan
            state: up
            mtu: 1400
            mac-address: 02:00:00:03:e9:01
            ipv4:
              enabled: true
              dhcp: true
              address:
              - ip: 192.0.2.100
                prefix-length: 24
            vlan:
              id: 1001
              base-iface: eno1
      
      

      Error Timeline

      2025-09-29T17:19:37Z  NNCP created
      2025-09-29T17:51:15Z  First etcd size error
      2025-09-29T17:52:24Z  etcd size error
      2025-09-29T17:53:10Z  etcd size error
      2025-09-29T17:53:32Z  etcd size error
      2025-09-29T17:53:54Z  etcd size error
      2025-09-29T17:54:17Z  etcd size error
      2025-09-29T17:54:41Z  etcd size error
      

      All 7 errors from pod nmstate-handler-dg46c reconciling master-0 NodeNetworkState.

      Related NNCP That Triggered Initial Interface Limit
      Policy create-1500-vlans-part-1000 failed with nmstate warning:

      2025-09-29T17:58:26.464Z {"level":"error","ts":"2025-09-29T17:58:26.464Z","logger":"controllers.NodeNetworkConfigurationPolicy","msg":"Rolling back network configuration, manual intervention needed: ","nodenetworkconfigurationpolicy":{"name":"create-1500-vlans-part-1000"},"error":"error reconciling NodeNetworkConfigurationPolicy on node master-0 at desired state apply: \"\",\n failed to execute nmstatectl apply --no-commit --timeout 480: 'exit status 1' '' '[2025-09-29T14:34:10Z INFO  nmstatectl] Nmstate version: 2.2.48
      [2025-09-29T14:34:10Z INFO  nmstate::nm::show] Got unsupported interface type generic: genev_sys_6081, ignoring
      [2025-09-29T14:34:10Z WARN  nmstate::query_apply::net_state] Interfaces count exceeds the support limit 1000 in desired state
      

      Handler logs: namespaces/openshift-nmstate/pods/nmstate-handler-dg46c/nmstate-handler/nmstate-handler/logs/current.log

              bnemec@redhat.com Benjamin Nemec
              rbrattai@redhat.com Ross Brattain
              None
              None
              Ross Brattain Ross Brattain
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: