Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-680

Improve reporting when version pod fails on SCC injection

XMLWordPrintable

    • 3
    • False
    • None
    • False
    • OTA 228, OTA 233

      An update from 4.10.13 to 4.10.15 wedged recently, when user mutating webooks injected a SecurityContextConstraint setting readOnlyRootFilesystem on the version-... download pod:

      apiVersion: v1
      kind: Pod
      metadata:
       ...
      annotations:
          ...
          openshift.io/scc: stackrox-collector
        ...
        name: version-4.10.15-5jz2v-s22gh
        ...
      spec:
        containers:
        - ...
          securityContext:
            privileged: true
            readOnlyRootFilesystem: true
          ...
        ...
      

      This caused failures like:

      mv: cannot remove '/manifests/0000_00_cluster-version-operator_01_adminack_configmap.yaml': Read-only file system
      ...
      

      leading to CrashLoopBackOff and finally the CVO complaining with:

          Message:               Retrieving payload failed version="4.10.15" image="quay.io/openshift-release-dev/ocp-release@sha256:ddcb70ce04a01ce487c0f4ad769e9e36a10c8c832a34307c1b1eb8e03a5b7ddb" failure=Unable to download and prepare the update: deadline exceeded, reason: "DeadlineExceeded", message: "Job was active longer than specified deadline" 
      

      But we needed a bunch of digging to work back from that DeadlineExceeded to the SCC injection. We should work on bubbling up lower level errors, at least into CVO logs, to make this sort of issue easier to diagnose.

            trking W. Trevor King
            trking W. Trevor King
            Votes:
            2 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: