Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55899

kube-apiserveroperator in degraded state after cluster network is expanded while having pre-existing EIP/EFW/NP/serviceUDN/UDN LB/externalIP upgrade setup

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.19.0
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Yes
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Version-Release number of selected component (if applicable):

      How reproducible:

      Steps to Reproduce:

      1.  Installed a cluster with "spec":{ "clusterNetwork": [

      {"cidr":"10.128.0.0/20","hostPrefix":23}

      2. set up EIP/EFW/NP/Service UDN/UDN LB/externalIP feature before upgrade

      3. Expanded cluster network by 

      $  oc patch Network.config.openshift.io cluster --type='merge' --patch '{ "spec":{ "clusterNetwork": [

      {"cidr":"10.128.0.0/19","hostPrefix":23}

      ], "networkType": "OVNKubernetes" }}'
      network.config.openshift.io/cluster patched

       

      4. kube-apiserver is degraded, invalid CIDR address: 10.0.15.93 is complained 

       

      $ oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.19.0-0.nightly-2025-05-06-051838   True        False         False      2s      
      baremetal                                  4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h49m   
      cloud-controller-manager                   4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h52m   
      cloud-credential                           4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h52m   
      cluster-autoscaler                         4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h49m   
      config-operator                            4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h50m   
      console                                    4.19.0-0.nightly-2025-05-06-051838   True        False         False      101s    
      control-plane-machine-set                  4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h47m   
      csi-snapshot-controller                    4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h49m   
      dns                                        4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h49m   
      etcd                                       4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h48m   
      image-registry                             4.19.0-0.nightly-2025-05-06-051838   True        False         False      4m47s   
      ingress                                    4.19.0-0.nightly-2025-05-06-051838   True        False         False      4m2s    
      insights                                   4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h49m   
      kube-apiserver                             4.19.0-0.nightly-2025-05-06-051838   True        False         True       4h44m   ConfigObservationDegraded: invalid CIDR address: 10.0.15.93
      kube-controller-manager                    4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h45m   
      kube-scheduler                             4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h47m   
      kube-storage-version-migrator              4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h50m   
      machine-api                                4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h44m   
      machine-approver                           4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h49m   
      machine-config                             4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h47m   
      marketplace                                4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h49m   
      monitoring                                 4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h37m   
      network                                    4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h51m   
      node-tuning                                4.19.0-0.nightly-2025-05-06-051838   True        False         False      3m14s   
      olm                                        4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h49m   
      openshift-apiserver                        4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h39m   
      openshift-controller-manager               4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h41m   
      openshift-samples                          4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h39m   
      operator-lifecycle-manager                 4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h49m   
      operator-lifecycle-manager-catalog         4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h49m   
      operator-lifecycle-manager-packageserver   4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h39m   
      service-ca                                 4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h50m   
      storage                                    4.19.0-0.nightly-2025-05-06-051838   True        False         False      4h49m   

       

      following errors are seen in 

      oc -n openshift-kube-apiserver-operator logs kube-apiserver-operator-789b654f94-n5z27 | grep invalid

      .......

      .......

      .......

      E0507 11:18:32.779771       1 base_controller.go:279] "Unhandled Error" err="CertRotationController reconciliation failed: KubeAPIServer.operator.openshift.io \"cluster\" is invalid: status.nodeStatuses[2].currentRevision: Invalid value: \"object\": cannot be unset once set"
      E0507 11:18:33.980378       1 base_controller.go:279] "Unhandled Error" err="CertRotationController reconciliation failed: KubeAPIServer.operator.openshift.io \"cluster\" is invalid: status.nodeStatuses[2].currentRevision: Invalid value: \"object\": cannot be unset once set"
      E0507 11:18:34.179310       1 base_controller.go:279] "Unhandled Error" err="CertRotationController reconciliation failed: KubeAPIServer.operator.openshift.io \"cluster\" is invalid: status.nodeStatuses[2].currentRevision: Invalid value: \"object\": cannot be unset once set"
      E0507 11:18:34.380412       1 base_controller.go:279] "Unhandled Error" err="CertRotationController reconciliation failed: KubeAPIServer.operator.openshift.io \"cluster\" is invalid: status.nodeStatuses[2].currentRevision: Invalid value: \"object\": cannot be unset once set"
      E0507 11:18:34.579125       1 base_controller.go:279] "Unhandled Error" err="CertRotationController reconciliation failed: KubeAPIServer.operator.openshift.io \"cluster\" is invalid: status.nodeStatuses[2].currentRevision: Invalid value: \"object\": cannot be unset once set"
      E0507 11:18:37.074422       1 base_controller.go:279] "Unhandled Error" err="TargetConfigController reconciliation failed: KubeAPIServer.operator.openshift.io \"cluster\" is invalid: status.nodeStatuses[2].currentRevision: Invalid value: \"object\": cannot be unset once set"
      I0507 14:34:50.401242       1 event.go:377] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"0d105bbe-f7d2-49ef-8b41-5614120a740c", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'GetExternalIPPolicyFailed' error parsing networks.config.openshift.io/cluster Spec.ExternalIP.Policy.AllowedCIDRs: invalid cidr: invalid CIDR address: 10.0.15.93
      E0507 14:34:50.431632       1 base_controller.go:279] "Unhandled Error" err="ConfigObserver reconciliation failed: invalid CIDR address: 10.0.15.93"
      I0507 14:34:50.435764       1 event.go:377] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"0d105bbe-f7d2-49ef-8b41-5614120a740c", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'GetExternalIPPolicyFailed' error parsing networks.config.openshift.io/cluster Spec.ExternalIP.Policy.AllowedCIDRs: invalid cidr: invalid CIDR address: 10.0.15.93
      I0507 14:34:50.436426       1 status_controller.go:229] clusteroperator/kube-apiserver diff {"status":{"conditions":[

      {"lastTransitionTime":"2025-05-07T11:17:51Z","message":"NodeControllerDegraded: All master nodes are ready\nConfigObservationDegraded: invalid CIDR address: 10.0.15.93","reason":"AsExpected","status":"False","type":"Degraded"}

      ,{"lastTransitionTime":"2025-05-07T11:34:35Z","message":"NodeInstallerProgressing: 3 nodes are at revision 6","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2025-05-07T11:14:42Z","message":"StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 6","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2025-05-07T11:09:08Z","message":"KubeletMinorVersionUpgradeable: Kubelet and API server minor versions are synced.","reason":"AsExpected","status":"True","type":"Upgradeable"},{"lastTransitionTime":"2025-05-07T11:09:39Z","message":"All is well","reason":"AsExpected","status":"False","type":"EvaluationConditionsDetected"}]}}
      I0507 14:34:50.499557       1 event.go:377] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"0d105bbe-f7d2-49ef-8b41-5614120a740c", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/kube-apiserver changed: Degraded message changed from "NodeControllerDegraded: All master nodes are ready" to "NodeControllerDegraded: All master nodes are ready\nConfigObservationDegraded: invalid CIDR address: 10.0.15.93"
      E0507 14:34:50.505493       1 base_controller.go:279] "Unhandled Error" err="ConfigObserver reconciliation failed: invalid CIDR address: 10.0.15.93"
      I0507 14:34:50.508868       1 event.go:377] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"0d105bbe-f7d2-49ef-8b41-5614120a740c", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'GetExternalIPPolicyFailed' error parsing networks.config.openshift.io/cluster Spec.ExternalIP.Policy.AllowedCIDRs: invalid cidr: invalid CIDR address: 10.0.15.93
      E0507 14:34:50.514985       1 base_controller.go:279] "Unhandled Error" err="ConfigObserver reconciliation failed: invalid CIDR address: 10.0.15.93"
      I0507 14:34:50.522074       1 event.go:377] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"0d105bbe-f7d2-49ef-8b41-5614120a740c", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'GetExternalIPPolicyFailed' error parsing networks.config.openshift.io/cluster Spec.ExternalIP.Policy.AllowedCIDRs: invalid cidr: invalid CIDR address: 10.0.15.93

       

       

      then, found more operators became degraded:

       

      $ oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.19.0-0.nightly-2025-05-06-051838   True        False         False      3m41s   
      baremetal                                  4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h28m   
      cloud-controller-manager                   4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h31m   
      cloud-credential                           4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h31m   
      cluster-autoscaler                         4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h28m   
      config-operator                            4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h29m   
      console                                    4.19.0-0.nightly-2025-05-06-051838   True        False         False      28s     
      control-plane-machine-set                  4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h26m   
      csi-snapshot-controller                    4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h28m   
      dns                                        4.19.0-0.nightly-2025-05-06-051838   True        True          False      5h28m   DNS "default" reports Progressing=True: "Have 5 available node-resolver pods, want 6."
      etcd                                       4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h27m   
      image-registry                             4.19.0-0.nightly-2025-05-06-051838   True        True          False      43m     Progressing: The registry is ready...
      ingress                                    4.19.0-0.nightly-2025-05-06-051838   True        False         False      43m     
      insights                                   4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h28m   
      kube-apiserver                             4.19.0-0.nightly-2025-05-06-051838   True        False         True       5h23m   ConfigObservationDegraded: invalid CIDR address: 10.0.15.93
      kube-controller-manager                    4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h24m   
      kube-scheduler                             4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h26m   
      kube-storage-version-migrator              4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h29m   
      machine-api                                4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h23m   
      machine-approver                           4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h28m   
      machine-config                             4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h26m   
      marketplace                                4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h28m   
      monitoring                                 4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h16m   
      network                                    4.19.0-0.nightly-2025-05-06-051838   True        True          False      5h30m   DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes)...
      node-tuning                                4.19.0-0.nightly-2025-05-06-051838   True        True          False      9s      Waiting for 1/6 Profiles to be applied
      olm                                        4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h28m   
      openshift-apiserver                        4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h18m   
      openshift-controller-manager               4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h20m   
      openshift-samples                          4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h18m   
      operator-lifecycle-manager                 4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h28m   
      operator-lifecycle-manager-catalog         4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h28m   
      operator-lifecycle-manager-packageserver   4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h18m   
      service-ca                                 4.19.0-0.nightly-2025-05-06-051838   True        False         False      5h29m   
      storage                                    4.19.0-0.nightly-2025-05-06-051838   True        True          False      5h28m   AWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods

      Actual results:  several operators are degraded

      Expected results:  operators should not be degraded

      Additional info:

       

      must-gather:  https://drive.google.com/file/d/1SxnhxIAUjC99mVjsa4z7WiP8-ERyPNOS/view?usp=drive_link

       

      Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

      Affected Platforms:

      Is it an

      1. internal CI failure
      2. customer issue / SD
      3. internal RedHat testing failure

      If it is an internal RedHat testing failure:

      • Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

      If it is a CI failure:

      • Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
      • Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
      • Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
      • When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
      • If it's a connectivity issue,
      • What is the srcNode, srcIP and srcNamespace and srcPodName?
      • What is the dstNode, dstIP and dstNamespace and dstPodName?
      • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

      If it is a customer / SD issue:

      • Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
      • Don’t presume that Engineering has access to Salesforce.
      • Do presume that Engineering will access attachments through supportshell.
      • Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
      • Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
        • If the issue is in a customer namespace then provide a namespace inspect.
        • If it is a connectivity issue:
          • What is the srcNode, srcNamespace, srcPodName and srcPodIP?
          • What is the dstNode, dstNamespace, dstPodName and dstPodIP?
          • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
          • Please provide the UTC timestamp networking outage window from must-gather
          • Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
        • If it is not a connectivity issue:
          • Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.
      • When showing the results from commands, include the entire command in the output.  
      • For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
      • For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
      • Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
      • Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”
      • For guidance on using this template please see
        OCPBUGS Template Training for Networking  components

              jluhrsen Jamo Luhrsen
              jechen@redhat.com Jean Chen
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: