Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23491

Kube-apiserver degraded on baremetal ipi cluster

XMLWordPrintable

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      KAS not coming up after restart during upgrade from 4.12->4.14. Kube-apiserver degraded on baremetal ipi cluster on installation

      https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-o[...]-ipi-ovn-ipv4-fips-f14/1726473005640454144

      Version-Release number of selected component (if applicable):

      4.12.44

      How reproducible:

      Not sure

      Steps to Reproduce:

      Upgrade from 4.12.44 to 4.14 with profile baremetalds-ipi-ovn-ipv4-fips-f14 

      Actual results:

      Upgrade failed and Kube-apiserver not come up after restart during upgrade

       

      status:
      conditions:
      
      lastTransitionTime: "2023-11-20T06:38:29Z"
      message: 'MissingStaticPodControllerDegraded: static pod lifecycle failure - static
      pod: "kube-apiserver" in namespace: "openshift-kube-apiserver" for revision:
      9 on node: "master-1" didn''t show up, waited: 4m45s'
      reason: MissingStaticPodController_SyncError
      status: "True"
      type: Degraded lastTransitionTime: "2023-11-20T06:30:38Z"
      message: 'NodeInstallerProgressing: 3 nodes are at revision 8; 0 nodes have achieved
      new revision 9'
      reason: NodeInstaller
      status: "True"
      type: Progressing
      the kubeapiserver never came up on that host, or is restarting.
      error waiting for pod: Get "https://[api-int.ostest.test.metalkube.org]:6443/api/v1/namespaces/openshift-kube-apiserver/pods/revision-pruner-6-master-1?timeout=1m0s":
      dial tcp 192.168.111.5:6443: connect: connection refused'
      And the Build log shows:
      error: timed out waiting for the condition on clusteroperators/kube-apiserver
      kube-apiserver's progressing status is not expected
      no junt xml for kube-apiserver yet
      {"component":"entrypoint","error":"wrapped process failed: exit status 1","file":"k8s.io/test-infra/prow/entrypoint/run.go:84","func":"k8s.io/test-infra/prow/entrypoint.Options.internalRun","level":"error","msg":"Error executing test process","severity":"error","time":"2023-11-20T06:51:01Z"}
      error: failed to execute wrapped command: exit status 1 
      INFO[2023-11-20T06:51:02Z] Step baremetalds-ipi-ovn-ipv4-fips-f14-openshift-extended-upgrade-pre failed after 15m57s. 
      INFO[2023-11-20T06:51:02Z] Step phase test failed after 17m55s. 
      It's seems to be a kube-apiserver operator issue
      /Users/rahulgangwar/Downloads/must-gather (1)/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-b24f0764071d6cf94215a006fd269c978a0a4c19353e6989702fc30c0fa675e3/namespaces/openshift-kube-apiserver/core
      rahulgangwar@rgangwar-mac core % vi events.yaml 
      lastTimestamp: "2023-11-20T06:13:32Z"
      message: 'Failed to create pod sandbox: rpc error: code = Unknown desc = failed
      to create pod network sandbox k8s_revision-pruner-6-master-1_openshift-kube-apiserver_8d923f91-27e7-4839-baa7-7e0816802775_0(877891bb5f90b9aceceee78015312ee40f3393b255a21e27abc85ba30c1bda48):
      error adding pod openshift-kube-apiserver_revision-pruner-6-master-1 to CNI network
      "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add):
      Multus: [openshift-kube-apiserver/revision-pruner-6-master-1/8d923f91-27e7-4839-baa7-7e0816802775]:
      error setting the networks status, pod was already deleted: SetNetworkStatus:
      failed to query the pod revision-pruner-6-master-1 in out of cluster comm: Get
      "https://[api-int.ostest.test.metalkube.org]:6443/api/v1/namespaces/openshift-kube-apiserver/pods/revision-pruner-6-master-1?timeout=1m0s":
      dial tcp 192.168.111.5:6443: connect: connection refused'
      kind: Event
      lastTimestamp: "2023-11-20T06:13:33Z"
      message: 'Failed to create pod sandbox: rpc error: code = Unknown desc = failed
      to create pod network sandbox k8s_revision-pruner-6-master-1_openshift-kube-apiserver_8d923f91-27e7-4839-baa7-7e0816802775_0(aba57ebd4ac0fe8e432f3a3a98e66de55b4cda3f212418d1621f117a0e78e065):
      error adding pod openshift-kube-apiserver_revision-pruner-6-master-1 to CNI network
      "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add):
      Multus: [openshift-kube-apiserver/revision-pruner-6-master-1/8d923f91-27e7-4839-baa7-7e0816802775]:
      error waiting for pod: Get "https://[api-int.ostest.test.metalkube.org]:6443/api/v1/namespaces/openshift-kube-apiserver/pods/revision-pruner-6-master-1?timeout=1m0s":
      dial tcp 192.168.111.5:6443: connect: connection refused'
      metadata:
      /Users/rahulgangwar/Downloads/must-gather (1)/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-b24f0764071d6cf94215a006fd269c978a0a4c19353e6989702fc30c0fa675e3/namespaces/openshift-kube-apiserver-operator/core
      rahulgangwar@rgangwar-mac core % vi events.yaml
      kind: Event
      lastTimestamp: "2023-11-20T06:09:59Z"
      message: 'Status for clusteroperator/kube-apiserver changed: Degraded message changed
      from "ConfigObservationDegraded: configmaps openshift-etcd/etcd-endpoints: no
      etcd endpoint addresses found\nGuardControllerDegraded: [Missing operand on node
      master-1, Missing operand on node master-0, Missing operand on node master-2]\nInstallerControllerDegraded:
      missing required resources: [configmaps: bound-sa-token-signing-certs-1,config-1,etcd-serving-ca-1,kube-apiserver-audit-policies-1,kube-apiserver-cert-syncer-kubeconfig-1,kube-apiserver-pod-1,kubelet-serving-ca-1,sa-token-signing-certs-1,
      secrets: etcd-client-1,localhost-recovery-client-token-1,localhost-recovery-serving-certkey-1]\nNodeKubeconfigControllerDegraded:
      \"secret/node-kubeconfigs\": configmap \"kube-apiserver-server-ca\" not found\nRevisionControllerDegraded:
      Operation cannot be fulfilled on kubeapiservers.operator.openshift.io \"cluster\":
      the object has been modified; please apply your changes to the latest version
      and try again" to "ConfigObservationDegraded: error writing updated observed config:
      Operation cannot be fulfilled on kubeapiservers.operator.openshift.io \"cluster\":
      the object has been modified; please apply your changes to the latest version
      and try again\nGuardControllerDegraded: [Missing operand on node master-1, Missing
      operand on node master-0, Missing operand on node master-2]\nInstallerControllerDegraded:
      missing required resources: [configmaps: bound-sa-token-signing-certs-1,config-1,etcd-serving-ca-1,kube-apiserver-audit-policies-1,kube-apiserver-cert-syncer-kubeconfig-1,kube-apiserver-pod-1,kubelet-serving-ca-1,sa-token-signing-certs-1,
      secrets: etcd-client-1,localhost-recovery-client-token-1,localhost-recovery-serving-certkey-1]\nNodeKubeconfigControllerDegraded:
      \"secret/node-kubeconfigs\": configmap \"kube-apiserver-server-ca\" not found\nRevisionControllerDegraded:
      Operation cannot be fulfilled on kubeapiservers.operator.openshift.io \"cluster\":
      the object has been modified; please apply your changes to the latest version
      and try again"'
      kind: Event
      lastTimestamp: "2023-11-20T06:10:02Z"
      message: 'Status for clusteroperator/kube-apiserver changed: Degraded message changed
      from "ConfigObservationDegraded: error writing updated observed config: Operation
      cannot be fulfilled on kubeapiservers.operator.openshift.io \"cluster\": the object
      has been modified; please apply your changes to the latest version and try again\nGuardControllerDegraded:
      [Missing operand on node master-1, Missing operand on node master-0, Missing operand
      on node master-2]\nInstallerControllerDegraded: missing required resources: [configmaps:
      bound-sa-token-signing-certs-1,config-1,etcd-serving-ca-1,kube-apiserver-audit-policies-1,kube-apiserver-cert-syncer-kubeconfig-1,kube-apiserver-pod-1,kubelet-serving-ca-1,sa-token-signing-certs-1,
      secrets: etcd-client-1,localhost-recovery-client-token-1,localhost-recovery-serving-certkey-1]\nRevisionControllerDegraded:
      Operation cannot be fulfilled on kubeapiservers.operator.openshift.io \"cluster\":
      the object has been modified; please apply your changes to the latest version
      and try again" to "ConfigObservationDegraded: error writing updated observed config:
      Operation cannot be fulfilled on kubeapiservers.operator.openshift.io \"cluster\":
      the object has been modified; please apply your changes to the latest version
      and try again\nGuardControllerDegraded: [Missing operand on node master-1, Missing
      operand on node master-0, Missing operand on node master-2]\nInstallerControllerDegraded:
      missing required resources: [configmaps: bound-sa-token-signing-certs-1,config-1,etcd-serving-ca-1,kube-apiserver-audit-policies-1,kube-apiserver-cert-syncer-kubeconfig-1,kube-apiserver-pod-1,kubelet-
      

      Expected results:

      Upgrade should be successfull

      Additional info:

      must-gather https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-o[...]-fips-f14/gather-must-gather/artifacts/

       

            dgrisonn@redhat.com Damien Grisonnet
            rhn-support-rgangwar Rahul Gangwar
            Rahul Gangwar Rahul Gangwar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: