-
Bug
-
Resolution: Done-Errata
-
Major
-
4.14
-
Yes
-
False
-
-
-
Bug Fix
-
Done
Description of problem:
following signing-key deletion, there is a service CA rotation process which might temporary disrupt cluster operators, but eventually all should regenerate. in recent 4.14 nighties however this is not the case anymore. following a deletion of the signing-key using oc delete secret/signing-key -n openshift-service-ca operators will progress for a while, but eventually console as well as monitoring will end up in available=false and degraded=true, which is only recoverable by manually deleting all the pods in the cluster.
console 4.14.0-0.nightly-2023-06-30-131338 False False True 159m RouteHealthAvailable: route not yet available, https://console-openshift-console.apps.evakhoni-0412.qe.gcp.devcluster.openshift.com returns '503 Service Unavailable'
monitoring 4.14.0-0.nightly-2023-06-30-131338 False True True 161m reconciling Console Plugin failed: retrieving ConsolePlugin object failed: conversion webhook for console.openshift.io/v1alpha1, Kind=ConsolePlugin failed: Post "https://webhook.openshift-console-operator.svc:9443/crdconvert?timeout=30s": tls: failed to verify certificate: x509: certificate signed by unknown authority
same deletion in the previous versions of 4.14-ec.2 or earlier doesn't have this issue, and able to recover eventually without any manual pod deletion. I believe this to be regression bug.
Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-06-30-131338 and other recent 4.14 nightlies
How reproducible:
100%
Steps to Reproduce:
1.oc delete secret/signing-key -n openshift-service-ca 2. wait at least 30+ minutes 3. observe oc get co
Actual results:
console and monitoring degraded and not recovering
Expected results:
able to recover eventually as in previous versions
Additional info:
using manual deletion of all pods it is possible to recover the cluster from this state as follows: for I in $(oc get ns -o jsonpath='{range .items[*]} {.metadata.name}{"\n"} {end}'); \ do oc delete pods --all -n $I; \ sleep 1; \ done
must-gather:
https://drive.google.com/file/d/1Y3RrYZlz0EncG-Iqt8USFPsTd-br36Zt/view?usp=sharing
- is cloned by
-
OCPBUGS-26983 monitoring-plugin becomes unavailable after forcing the rotation of the service's certificate
- Closed
- is duplicated by
-
OCPBUGS-21818 After deleting and recreating default CA certificate - route not able due to bad certificate
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update
(1 links to)