-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.15.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
None
-
In Progress
-
Release Note Not Required
-
N/A: same version fix of API typing
-
None
-
None
-
None
-
None
Description of problem:
Operator condition available=False during upgrade (4.14.0-rc.7 to 4.15.0-0.nightly-2023-10-24-230302) with below error
status:
conditions:
- lastTransitionTime: "2023-10-25T02:05:07Z"
message: Working towards 4.15.0-0.nightly-2023-10-24-230302
status: "True"
type: Progressing
- lastTransitionTime: "2023-10-25T02:06:36Z"
status: "False"
type: Degraded
- lastTransitionTime: "2023-10-25T02:05:54Z"
message: 'Cluster not available for [{operator 4.14.0-rc.7}]: ControllerConfig.machineconfiguration.openshift.io
"machine-config-controller" is invalid: [status.controllerCertificates[0].notAfter:
Required value, status.controllerCertificates[0].notBefore: Required value,
status.controllerCertificates[1].notAfter: Required value, status.controllerCertificates[1].notBefore:
Required value, status.controllerCertificates[2].notAfter: Required value, status.controllerCertificates[2].notBefore:
Required value, status.controllerCertificates[3].notAfter: Required value, status.controllerCertificates[3].notBefore:
Required value, status.controllerCertificates[4].notAfter: Required value, status.controllerCertificates[4].notBefore:
Required value, status.controllerCertificates[5].notAfter: Required value, status.controllerCertificates[5].notBefore:
Required value, status.controllerCertificates[6].notAfter: Required value, status.controllerCertificates[6].notBefore:
Required value, status.controllerCertificates[7].notAfter: Required value, status.controllerCertificates[7].notBefore:
Required value, <nil>: Invalid value: "null": some validation rules were not
checked because the object was invalid; correct the existing errors to complete
validation]'
reason: MachineConfigControllerFailed
status: "False"
type: Available
- lastTransitionTime: "2023-10-25T02:06:43Z"
message: One or more machine config pools are degraded, please see `oc get mcp`
for further details and resolve before upgrading
reason: DegradedPool
status: "False"
type: Upgradeable
extension:
master: 'pool is degraded because nodes fail with "2 nodes are reporting degraded
status on sync": "Node ip-10-0-68-187.us-east-2.compute.internal is reporting:
\"open /etc/docker/certs.d: no such file or directory\", Node ip-10-0-51-97.us-east-2.compute.internal
is reporting: \"open /etc/docker/certs.d: no such file or directory\""'
worker: all 2 nodes are at latest configuration rendered-worker-f9ff345da4f6740868e7238d5f820b65
Finally the upgrade can be completed successfully
Version-Release number of selected component (if applicable):
4.15.0-0.nightly-2023-10-24-230302
How reproducible:
upgrade cluster from 4.14 to 4.15 nightly
Steps to Reproduce:
1. setup 4.14 cluster 2. upgrade the cluster to 4.15 nightly 3. during the upgrade check the operator status
Actual results:
Condition Available=False with validation error of controllerCertificates
Expected results:
No validation error
Additional info:
struct of ControllerCertificate in 4.14 branch
type ControllerCertificate struct {
// subject is the cert subject
Subject string `json:"subject"`
// signer is the cert Issuer
Signer string `json:"signer"`
// bundleFile is the larger bundle a cert comes from
BundleFile string `json:"bundleFile"`
}
struct of ControllerCertificate in master branch (4.15)
type ControllerCertificate struct {
// subject is the cert subject
// +kubebuilder:validation:Required
Subject string `json:"subject"`
// signer is the cert Issuer
// +kubebuilder:validation:Required
Signer string `json:"signer"`
// notBefore is the lower boundary for validity
// +kubebuilder:validation:Required
NotBefore *metav1.Time `json:"notBefore"`
// notAfter is the upper boundary for validity
// +kubebuilder:validation:Required
NotAfter *metav1.Time `json:"notAfter"`
// bundleFile is the larger bundle a cert comes from
// +kubebuilder:validation:Required
BundleFile string `json:"bundleFile"`
}
related PR https://github.com/openshift/machine-config-operator/pull/3747
must-gather: https://drive.google.com/file/d/1L3pu8Jj3zGBx7m3elve5WzOvGRPYugnA/view?usp=drive_link
- is related to
-
OCPBUGS-24228 machine-config ClusterOperator should not blip Available=False on brief missing HTTP content-type
-
- Closed
-
- relates to
-
OTA-362 CI: fail update suite if any ClusterOperator go Available=False
-
- Closed
-
-
TRT-1334 Machine Config Operator controllerCertificates Validation Error
-
- Closed
-
- links to
-
RHSA-2023:7198
OpenShift Container Platform 4.15 security update