Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-31135

Aggregate fatal progressing and degraded conditions in CDI CR

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • Storage Platform
    • Quality / Stability / Reliability
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
      • CDI CR aggregates progressing and degraded conditions from it's operands
      • Upgrade flow ready for failforward mechanism
    • None

      Today, the CDI CR will never reach "Deployed" unless all its operands are
      happy.
      While this is a good indication for debugging an install, it would beĀ 
      a great step forward to also aggregate fatal conditions from CDI operands onto the CDI CR:
      $ oc get deployments -n openshift-cnv cdi-deployment -o json | jq .status.conditions
      [

      { "lastTransitionTime": "2022-09-21T21:21:28Z", "lastUpdateTime": "2022-09-21T21:21:28Z", "message": "Deployment does not have minimum availability.", "reason": "MinimumReplicasUnavailable", "status": "False", "type": "Available" }

      ,

      { "lastTransitionTime": "2022-09-21T21:21:28Z", "lastUpdateTime": "2022-09-21T21:21:28Z", "message": "pods \"cdi-deployment-6f4888b5cb-r9f5h\" is forbidden: violates PodSecurity \"restricted:latest\": allowPrivilegeEscalation != false (container \"cdi-controller\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container \"cdi-controller\" must set securityContext.capabilities.drop=[\"ALL\"]), seccompProfile (pod or container \"cdi-controller\" must set securityContext.seccompProfile.type to \"RuntimeDefault\" or \"Localhost\")", "reason": "FailedCreate", "status": "True", "type": "ReplicaFailure" }

      ,

      { "lastTransitionTime": "2022-09-21T21:31:29Z", "lastUpdateTime": "2022-09-21T21:31:29Z", "message": "ReplicaSet \"cdi-deployment-6f4888b5cb\" has timed out progressing.", "reason": "ProgressDeadlineExceeded", "status": "False", "type": "Progressing" }

      ]
      Today this would translate to

      • lastHeartbeatTime: "2022-09-21T21:21:25Z"
        lastTransitionTime: "2022-09-21T21:21:25Z"
        message: Started Deployment
        reason: DeployStarted
        status: "True"
        type: Progressing
      • lastHeartbeatTime: "2022-09-21T21:21:25Z"
        lastTransitionTime: "2022-09-21T21:21:25Z"
        status: "False"
        type: Degraded
        on the CDI CR, instead of Progressing=False, Degraded=True.

      Note that correctly reporting a failed install/upgrade is actually a prerequisite for the unsafe fail forward upgrades feature:
      https://olm.operatorframework.io/docs/advanced-tasks/unsafe-fail-forward-upgrades/

      Currently, we are still not opting in, but that feature will eventually let customers try to recover from stuck upgrades.
      Source:
      https://bugzilla.redhat.com/show_bug.cgi?id=2128906

              alitke@redhat.com Adam Litke
              akalenyu Alex Kalenyuk
              Natalie Gavrielov Natalie Gavrielov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: