Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18454

CVO chokes on ConditionalUpdateRisk with name that is not a valid condition reason

XMLWordPrintable

    • Moderate
    • No
    • 3
    • OTA 241, OTA 248
    • 2
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      Description of problem:

      Conditional risks have looser naming restrictions:

      	// +kubebuilder:validation:Required
      	// +kubebuilder:validation:MinLength=1
      	// +required
      	Name string `json:"name"`
      

      ...than condition Reason field:

      	// +required
      	// +kubebuilder:validation:Required
      	// +kubebuilder:validation:MaxLength=1024
      	// +kubebuilder:validation:MinLength=1
      	// +kubebuilder:validation:Pattern=`^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$`
      	Reason string `json:"reason" protobuf:"bytes,5,opt,name=reason"`
      

      CVO can use a name risk as a reason so when a name of the applying risk is invalid, CVO will start trying to update the ClusterVersion resource with an invalid one.

      Version-Release number of selected component (if applicable):

      4.14

      How reproducible:

      always

      Steps to Reproduce:

      Make the cluster consume update graph data containing a conditional edge with a risk with a name that does not follow the Condition.Reason restriction, e.g. uses a - character. The risk needs to apply on the cluster. For example:

      {
        "nodes": [
          {"version": "CLUSTER-BOT-VERSION", "payload": "CLUSTER-BOT-PAYLOAD"},
          {"version": "4.12.22", "payload": "quay.io/openshift-release-dev/ocp-release@sha256:1111111111111111111111111111111111111111111111111111111111111111"}
        ],
        "conditionalEdges": [
          {
            "edges": [{"from": "CLUSTER-BOT-VERSION", "to": "4.12.22"}],
            "risks": [
              {
                "url": "https://github.com/openshift/api/blob/8891815aa476232109dccf6c11b8611d209445d9/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1515C4-L1520",
                "name": "OCPBUGS-9050",
                "message": "THere is no validation that risk name is a valid Condition.Reason so let's just use a - character in it.",
                "matchingRules": [{"type": "PromQL", "promql": { "promql": "group by (type) (cluster_infrastructure_provider)"}}]
              }
            ]
          }
        ]
      }
      

      Then, observe the ClusterVersion status field after the cluster has a chance to evaluate the risk:

      $ oc get clusterversion version -o jsonpath='{.status.conditions}' | jq '.[] | select(.type=="Progressing" or .type=="Available" or .type=="Failing")'
      

      Actual results:

      {
        "lastTransitionTime": "2023-09-01T13:21:49Z",
        "status": "False",
        "type": "Available"
      }
      {
        "lastTransitionTime": "2023-09-01T13:21:49Z",
        "message": "ClusterVersion.config.openshift.io \"version\" is invalid: status.conditionalUpdates[0].conditions[0].reason: Invalid value: \"OCPBUGS-9050\": status.conditionalUpdates[0].conditions[0].reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$'",
        "status": "True",
        "type": "Failing"
      }
      {
        "lastTransitionTime": "2023-09-01T13:14:34Z",
        "message": "Error ensuring the cluster version is up to date: ClusterVersion.config.openshift.io \"version\" is invalid: status.conditionalUpdates[0].conditions[0].reason: Invalid value: \"OCPBUGS-9050\": status.conditionalUpdates[0].conditions[0].reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$'",
        "status": "False",
        "type": "Progressing"
      }
      

      Expected results:

      No errors, CVO continues to work and either uses some sanitized version of the name as the reason, or maybe uses something generic, like RiskApplies.

      CVO does not get stuck after consuming data from external source

      Additional info:

      1. We should CI PRs to o/cincinnati-graph-data so we never create invalid data
      2. We should sanitize the field in CVO code so that CVO never attempts to submit an invalid ClusterVersion.status.conditionalUpdates.condition.reason
      3. We should further restrict the conditional update risk name in the CRD so it is guaranteed compatible with Condition.Reason
      4. We should sanitize the field in CVO code after being read from OSUS so that CVO never attempts to submit an invalid (after we do 3) ClusterVersion.conditinalUpdates.name

            afri@afri.cz Petr Muller
            afri@afri.cz Petr Muller
            Yang Yang Yang Yang
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: