-
Bug
-
Resolution: Cannot Reproduce
-
Normal
-
None
-
4.12.z
-
Moderate
-
No
-
False
-
Description of problem:
KAS not coming up after restart during upgrade from 4.12->4.14. Kube-apiserver degraded on baremetal ipi cluster on installation
Version-Release number of selected component (if applicable):
4.12.44
How reproducible:
Not sure
Steps to Reproduce:
Upgrade from 4.12.44 to 4.14 with profile baremetalds-ipi-ovn-ipv4-fips-f14
Actual results:
Upgrade failed and Kube-apiserver not come up after restart during upgrade
status: conditions: lastTransitionTime: "2023-11-20T06:38:29Z" message: 'MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "kube-apiserver" in namespace: "openshift-kube-apiserver" for revision: 9 on node: "master-1" didn''t show up, waited: 4m45s' reason: MissingStaticPodController_SyncError status: "True" type: Degraded lastTransitionTime: "2023-11-20T06:30:38Z" message: 'NodeInstallerProgressing: 3 nodes are at revision 8; 0 nodes have achieved new revision 9' reason: NodeInstaller status: "True" type: Progressing the kubeapiserver never came up on that host, or is restarting. error waiting for pod: Get "https://[api-int.ostest.test.metalkube.org]:6443/api/v1/namespaces/openshift-kube-apiserver/pods/revision-pruner-6-master-1?timeout=1m0s": dial tcp 192.168.111.5:6443: connect: connection refused' And the Build log shows: error: timed out waiting for the condition on clusteroperators/kube-apiserver kube-apiserver's progressing status is not expected no junt xml for kube-apiserver yet {"component":"entrypoint","error":"wrapped process failed: exit status 1","file":"k8s.io/test-infra/prow/entrypoint/run.go:84","func":"k8s.io/test-infra/prow/entrypoint.Options.internalRun","level":"error","msg":"Error executing test process","severity":"error","time":"2023-11-20T06:51:01Z"} error: failed to execute wrapped command: exit status 1 INFO[2023-11-20T06:51:02Z] Step baremetalds-ipi-ovn-ipv4-fips-f14-openshift-extended-upgrade-pre failed after 15m57s. INFO[2023-11-20T06:51:02Z] Step phase test failed after 17m55s. It's seems to be a kube-apiserver operator issue /Users/rahulgangwar/Downloads/must-gather (1)/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-b24f0764071d6cf94215a006fd269c978a0a4c19353e6989702fc30c0fa675e3/namespaces/openshift-kube-apiserver/core rahulgangwar@rgangwar-mac core % vi events.yaml lastTimestamp: "2023-11-20T06:13:32Z" message: 'Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_revision-pruner-6-master-1_openshift-kube-apiserver_8d923f91-27e7-4839-baa7-7e0816802775_0(877891bb5f90b9aceceee78015312ee40f3393b255a21e27abc85ba30c1bda48): error adding pod openshift-kube-apiserver_revision-pruner-6-master-1 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-kube-apiserver/revision-pruner-6-master-1/8d923f91-27e7-4839-baa7-7e0816802775]: error setting the networks status, pod was already deleted: SetNetworkStatus: failed to query the pod revision-pruner-6-master-1 in out of cluster comm: Get "https://[api-int.ostest.test.metalkube.org]:6443/api/v1/namespaces/openshift-kube-apiserver/pods/revision-pruner-6-master-1?timeout=1m0s": dial tcp 192.168.111.5:6443: connect: connection refused' kind: Event lastTimestamp: "2023-11-20T06:13:33Z" message: 'Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_revision-pruner-6-master-1_openshift-kube-apiserver_8d923f91-27e7-4839-baa7-7e0816802775_0(aba57ebd4ac0fe8e432f3a3a98e66de55b4cda3f212418d1621f117a0e78e065): error adding pod openshift-kube-apiserver_revision-pruner-6-master-1 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-kube-apiserver/revision-pruner-6-master-1/8d923f91-27e7-4839-baa7-7e0816802775]: error waiting for pod: Get "https://[api-int.ostest.test.metalkube.org]:6443/api/v1/namespaces/openshift-kube-apiserver/pods/revision-pruner-6-master-1?timeout=1m0s": dial tcp 192.168.111.5:6443: connect: connection refused' metadata: /Users/rahulgangwar/Downloads/must-gather (1)/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-b24f0764071d6cf94215a006fd269c978a0a4c19353e6989702fc30c0fa675e3/namespaces/openshift-kube-apiserver-operator/core rahulgangwar@rgangwar-mac core % vi events.yaml kind: Event lastTimestamp: "2023-11-20T06:09:59Z" message: 'Status for clusteroperator/kube-apiserver changed: Degraded message changed from "ConfigObservationDegraded: configmaps openshift-etcd/etcd-endpoints: no etcd endpoint addresses found\nGuardControllerDegraded: [Missing operand on node master-1, Missing operand on node master-0, Missing operand on node master-2]\nInstallerControllerDegraded: missing required resources: [configmaps: bound-sa-token-signing-certs-1,config-1,etcd-serving-ca-1,kube-apiserver-audit-policies-1,kube-apiserver-cert-syncer-kubeconfig-1,kube-apiserver-pod-1,kubelet-serving-ca-1,sa-token-signing-certs-1, secrets: etcd-client-1,localhost-recovery-client-token-1,localhost-recovery-serving-certkey-1]\nNodeKubeconfigControllerDegraded: \"secret/node-kubeconfigs\": configmap \"kube-apiserver-server-ca\" not found\nRevisionControllerDegraded: Operation cannot be fulfilled on kubeapiservers.operator.openshift.io \"cluster\": the object has been modified; please apply your changes to the latest version and try again" to "ConfigObservationDegraded: error writing updated observed config: Operation cannot be fulfilled on kubeapiservers.operator.openshift.io \"cluster\": the object has been modified; please apply your changes to the latest version and try again\nGuardControllerDegraded: [Missing operand on node master-1, Missing operand on node master-0, Missing operand on node master-2]\nInstallerControllerDegraded: missing required resources: [configmaps: bound-sa-token-signing-certs-1,config-1,etcd-serving-ca-1,kube-apiserver-audit-policies-1,kube-apiserver-cert-syncer-kubeconfig-1,kube-apiserver-pod-1,kubelet-serving-ca-1,sa-token-signing-certs-1, secrets: etcd-client-1,localhost-recovery-client-token-1,localhost-recovery-serving-certkey-1]\nNodeKubeconfigControllerDegraded: \"secret/node-kubeconfigs\": configmap \"kube-apiserver-server-ca\" not found\nRevisionControllerDegraded: Operation cannot be fulfilled on kubeapiservers.operator.openshift.io \"cluster\": the object has been modified; please apply your changes to the latest version and try again"' kind: Event lastTimestamp: "2023-11-20T06:10:02Z" message: 'Status for clusteroperator/kube-apiserver changed: Degraded message changed from "ConfigObservationDegraded: error writing updated observed config: Operation cannot be fulfilled on kubeapiservers.operator.openshift.io \"cluster\": the object has been modified; please apply your changes to the latest version and try again\nGuardControllerDegraded: [Missing operand on node master-1, Missing operand on node master-0, Missing operand on node master-2]\nInstallerControllerDegraded: missing required resources: [configmaps: bound-sa-token-signing-certs-1,config-1,etcd-serving-ca-1,kube-apiserver-audit-policies-1,kube-apiserver-cert-syncer-kubeconfig-1,kube-apiserver-pod-1,kubelet-serving-ca-1,sa-token-signing-certs-1, secrets: etcd-client-1,localhost-recovery-client-token-1,localhost-recovery-serving-certkey-1]\nRevisionControllerDegraded: Operation cannot be fulfilled on kubeapiservers.operator.openshift.io \"cluster\": the object has been modified; please apply your changes to the latest version and try again" to "ConfigObservationDegraded: error writing updated observed config: Operation cannot be fulfilled on kubeapiservers.operator.openshift.io \"cluster\": the object has been modified; please apply your changes to the latest version and try again\nGuardControllerDegraded: [Missing operand on node master-1, Missing operand on node master-0, Missing operand on node master-2]\nInstallerControllerDegraded: missing required resources: [configmaps: bound-sa-token-signing-certs-1,config-1,etcd-serving-ca-1,kube-apiserver-audit-policies-1,kube-apiserver-cert-syncer-kubeconfig-1,kube-apiserver-pod-1,kubelet-
Expected results:
Upgrade should be successfull
Additional info:
must-gather https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-o[...]-fips-f14/gather-must-gather/artifacts/