Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-38894

Image registry unable to run due to permissions error

    • Critical
    • Yes
    • Approved
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      This is a clone of issue OCPBUGS-38842. The following is the description of the original issue:
      โ€”
      Component Readiness has found a potential regression in the following test:

      [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers for ns/openshift-image-registry

      Probability of significant regression: 98.02%

      Sample (being evaluated) Release: 4.17
      Start Time: 2024-08-15T00:00:00Z
      End Time: 2024-08-22T23:59:59Z
      Success Rate: 94.74%
      Successes: 180
      Failures: 10
      Flakes: 0

      Base (historical) Release: 4.16
      Start Time: 2024-05-31T00:00:00Z
      End Time: 2024-06-27T23:59:59Z
      Success Rate: 100.00%
      Successes: 89
      Failures: 0
      Flakes: 0

      View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?Architecture=amd64&Architecture=amd64&FeatureSet=default&FeatureSet=default&Installer=ipi&Installer=ipi&Network=ovn&Network=ovn&NetworkAccess=default&Platform=aws&Platform=aws&Scheduler=default&SecurityMode=default&Suite=unknown&Suite=unknown&Topology=ha&Topology=ha&Upgrade=micro&Upgrade=micro&baseEndTime=2024-06-27%2023%3A59%3A59&baseRelease=4.16&baseStartTime=2024-05-31%2000%3A00%3A00&capability=Other&columnGroupBy=Platform%2CArchitecture%2CNetwork&component=Image%20Registry&confidence=95&dbGroupBy=Platform%2CArchitecture%2CNetwork%2CTopology%2CFeatureSet%2CUpgrade%2CSuite%2CInstaller&environment=amd64%20default%20ipi%20ovn%20aws%20unknown%20ha%20micro&ignoreDisruption=true&ignoreMissing=false&includeVariant=Architecture%3Aamd64&includeVariant=FeatureSet%3Adefault&includeVariant=Installer%3Aipi&includeVariant=Installer%3Aupi&includeVariant=Owner%3Aeng&includeVariant=Platform%3Aaws&includeVariant=Platform%3Aazure&includeVariant=Platform%3Agcp&includeVariant=Platform%3Ametal&includeVariant=Platform%3Avsphere&includeVariant=Topology%3Aha&minFail=3&pity=5&sampleEndTime=2024-08-22%2023%3A59%3A59&sampleRelease=4.17&sampleStartTime=2024-08-15%2000%3A00%3A00&testId=openshift-tests-upgrade%3A10a9e2be27aa9ae799fde61bf8c992f6&testName=%5Bsig-cluster-lifecycle%5D%20pathological%20event%20should%20not%20see%20excessive%20Back-off%20restarting%20failed%20containers%20for%20ns%2Fopenshift-image-registry

      Also hitting 4.17, I've aligned this bug to 4.18 so the backport process is cleaner.

      The problem appears to be a permissions error preventing the pods from starting:

      2024-08-22T06:14:14.743856620Z ln: failed to create symbolic link '/etc/pki/ca-trust/extracted/pem/directory-hash/ca-certificates.crt': Permission denied
      

      Originating from this code: https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/resource/podtemplatespec.go#L489

      Both 4.17 and 4.18 nightlies bumped rhcos and in there is an upgrade like this:

      container-selinux-3-2.231.0-1.rhaos4.16.el9-noarch container-selinux-3-2.231.0-2.rhaos4.17.el9-noarch

      With slightly different versions in each stream, but both were on 3-2.231.

      Hits other tests too:

      operator conditions image-registry
      Operator upgrade image-registry
      [sig-cluster-lifecycle] Cluster completes upgrade
      [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]
      [sig-arch][Feature:ClusterUpgrade] Cluster should be upgradeable after finishing upgrade [Late][Suite:upgrade]
      

            [OCPBUGS-38894] Image registry unable to run due to permissions error

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Important: OpenShift Container Platform 4.16.10 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:6004

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.16.10 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:6004

            Flavian Missi added a comment - - edited

            We needed to update some 4.15/4.16 only code to fix the error Wen found: https://github.com/openshift/cluster-image-registry-operator/pull/1102; https://github.com/openshift/cluster-image-registry-operator/pull/1105 .

            rhn-support-wewang do you think you will be able to verify this bug today?

             

            EDIT: I see that https://github.com/openshift/cluster-image-registry-operator/pull/1102#issuecomment-2314779431 has the qe-approved label so I guess it will get moved to verified automatically eventually.

            Flavian Missi added a comment - - edited We needed to update some 4.15/4.16 only code to fix the error Wen found: https://github.com/openshift/cluster-image-registry-operator/pull/1102 ; https://github.com/openshift/cluster-image-registry-operator/pull/1105 . rhn-support-wewang do you think you will be able to verify this bug today?   EDIT: I see that https://github.com/openshift/cluster-image-registry-operator/pull/1102#issuecomment-2314779431 has the qe-approved label so I guess it will get moved to verified automatically eventually.

            Wen Wang added a comment - - edited

            Wen Wang added a comment - - edited And i use nightly build : 4.16.0-0.nightly-2024-08-27-231141 to launch a cluster, it met issues: " Cluster operator image-registry Degraded is True with AzurePathFixControllerFailed::ImagePrunerJobFailed: AzurePathFixControllerDegraded: Migration failed: ln: failed to create symbolic link '/etc/pki/ca-trust/extracted/pem/directory-hash/ca-certificates.crt': Permission denied "

            Wen Wang added a comment -

            HI fmissi The pr is merged in Aug 26,2024,9:55pm, but seems latest job still has issue, which is runned on 2024-08-27T04:16:54

            Wen Wang added a comment - HI fmissi The pr is merged in Aug 26,2024,9:55pm, but seems latest job still has issue, which is runned on 2024-08-27T04:16:54

            Wen Wang added a comment -

            New job is not coming, will wait a new job to coming to check the result

            Wen Wang added a comment - New job is not coming , will wait a new job to coming to check the result

              fmissi Flavian Missi
              openshift-crt-jira-prow OpenShift Prow Bot
              XiuJuan Wang XiuJuan Wang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: