-
Spike
-
Resolution: Unresolved
-
Critical
-
None
-
None
-
None
-
Improvement
-
1
-
False
-
None
-
False
-
-
Impact statement for the OCPBUGS-25821 series and the OCPBUGS-31384 series.
Which 4.y.z to 4.y'.z' updates increase vulnerability?
Updates for clusters born in 4.7 or earlier from 4.14 to 4.15 until resolved.
Which types of clusters?
All clusters originally installed on OpenShift Container Platform (OCP) version 4.6 or earlier. Clusters installed on 4.7 or later and new 4.15 installs are unaffected.
Note, due to history pruning in the CVO we cannot reliably detect born in versions less than 4.9. Therefore the conditional update rule will be updated to omit update recommendations in all clusters born in 4.8 and earlier.
What is the impact? Is it serious enough to warrant removing update recommendations?
The Kubernetes API server operator will begin rolling a number of certificates, including the CA that signs api-int. Missing pieces in the pipeline to roll out the new CA to kubelets and other consumers lead the cluster to lock up when the Kubernetes API servers transition to using the new cert/CA pair when serving incoming requests. For example, nodes may go NotReady with kubelets unable to call in their status to an api-int signed by the new CA that they don't yet trust.
How involved is remediation?
We recovered a bitten cluster by:
1. Manually injecting the new cert from openshift-config-managed's kube-apiserver-server-ca Secret into /var/lib/kubelet/kubeconfig on the control-plane nodes, and then restarting their kubelets with systemctl restart kubelet. This got the kubelets talking to api-int again.
2. Deleting some pods in openshift-machine-api and maybe openshift-machine-config-operator that had been stuck in ContainerCreating. Their replacements came up smoothly.
3. Deleting all the compute Machines to force reprovisions with working certs.
Is this a regression?
Yes, in OCP 4.15.
- blocks
-
OCPBUGS-25821 cert issues during or after 4.14 to 4.15 upgrade
- Closed
- is related to
-
OCPBUGS-31384 api-int Certificate Authority rotation during 4.14.17 to 4.15.3 update
- Closed
-
OCPBUGS-27429 Handle kubeconfig changes like CA rotation
- Closed
-
OCPBUGS-31807 api-int Certificate Authority rotation during 4.14.17 to 4.15.3 update
- Closed
- relates to
-
API-1623 [spike] investigate client-go: doesn’t auto-reload (bounded)tokens and CAs
- New
-
API-1603 Fallback (Protocol) for Emergency Certificate Rotation
- To Do
-
OCPBUGS-18761 revert "force cert rotation every couple days for development" in 4.15
- Closed
-
API-1688 Dev branch rotation should be enabled by a FeatureGate
- New
- links to