-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.14.0
-
Moderate
-
No
-
3
-
OTA 241, OTA 248
-
2
-
False
-
-
Release Note Not Required
-
In Progress
Description of problem:
Conditional risks have looser naming restrictions:
// +kubebuilder:validation:Required // +kubebuilder:validation:MinLength=1 // +required Name string `json:"name"`
...than condition Reason field:
// +required // +kubebuilder:validation:Required // +kubebuilder:validation:MaxLength=1024 // +kubebuilder:validation:MinLength=1 // +kubebuilder:validation:Pattern=`^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$` Reason string `json:"reason" protobuf:"bytes,5,opt,name=reason"`
CVO can use a name risk as a reason so when a name of the applying risk is invalid, CVO will start trying to update the ClusterVersion resource with an invalid one.
Version-Release number of selected component (if applicable):
4.14
How reproducible:
always
Steps to Reproduce:
Make the cluster consume update graph data containing a conditional edge with a risk with a name that does not follow the Condition.Reason restriction, e.g. uses a - character. The risk needs to apply on the cluster. For example:
{ "nodes": [ {"version": "CLUSTER-BOT-VERSION", "payload": "CLUSTER-BOT-PAYLOAD"}, {"version": "4.12.22", "payload": "quay.io/openshift-release-dev/ocp-release@sha256:1111111111111111111111111111111111111111111111111111111111111111"} ], "conditionalEdges": [ { "edges": [{"from": "CLUSTER-BOT-VERSION", "to": "4.12.22"}], "risks": [ { "url": "https://github.com/openshift/api/blob/8891815aa476232109dccf6c11b8611d209445d9/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1515C4-L1520", "name": "OCPBUGS-9050", "message": "THere is no validation that risk name is a valid Condition.Reason so let's just use a - character in it.", "matchingRules": [{"type": "PromQL", "promql": { "promql": "group by (type) (cluster_infrastructure_provider)"}}] } ] } ] }
Then, observe the ClusterVersion status field after the cluster has a chance to evaluate the risk:
$ oc get clusterversion version -o jsonpath='{.status.conditions}' | jq '.[] | select(.type=="Progressing" or .type=="Available" or .type=="Failing")'
Actual results:
{ "lastTransitionTime": "2023-09-01T13:21:49Z", "status": "False", "type": "Available" } { "lastTransitionTime": "2023-09-01T13:21:49Z", "message": "ClusterVersion.config.openshift.io \"version\" is invalid: status.conditionalUpdates[0].conditions[0].reason: Invalid value: \"OCPBUGS-9050\": status.conditionalUpdates[0].conditions[0].reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$'", "status": "True", "type": "Failing" } { "lastTransitionTime": "2023-09-01T13:14:34Z", "message": "Error ensuring the cluster version is up to date: ClusterVersion.config.openshift.io \"version\" is invalid: status.conditionalUpdates[0].conditions[0].reason: Invalid value: \"OCPBUGS-9050\": status.conditionalUpdates[0].conditions[0].reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$'", "status": "False", "type": "Progressing" }
Expected results:
No errors, CVO continues to work and either uses some sanitized version of the name as the reason, or maybe uses something generic, like RiskApplies.
CVO does not get stuck after consuming data from external source
Additional info:
1. We should CI PRs to o/cincinnati-graph-data so we never create invalid data
2. We should sanitize the field in CVO code so that CVO never attempts to submit an invalid ClusterVersion.status.conditionalUpdates.condition.reason
3. We should further restrict the conditional update risk name in the CRD so it is guaranteed compatible with Condition.Reason
4. We should sanitize the field in CVO code after being read from OSUS so that CVO never attempts to submit an invalid (after we do 3) ClusterVersion.conditinalUpdates.name
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update