Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5523

Catalog, fatal error: concurrent map read and map write

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • 4.10
    • OLM
    • Moderate
    • None
    • Voltron 230, Windu 231
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      catalog pod restarting frequently  after one stack trace daily.          ~~~                                                                          $ omc logs catalog-operator-f7477865d-x6frl -p
      2023-01-04T13:05:15.175952229Z time="2023-01-04T13:05:15Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
      2023-01-04T13:05:15.175952229Z fatal error: concurrent map read and map write
      2023-01-04T13:05:15.178587884Z
      2023-01-04T13:05:15.178674833Z goroutine 669 [running]:
      2023-01-04T13:05:15.179284556Z runtime.throw({0x1efdc12, 0xc000580000})
      2023-01-04T13:05:15.179458107Z 	/usr/lib/golang/src/runtime/panic.go:1198 +0x71 fp=0xc00559d098 sp=0xc00559d068 pc=0x43bcd1
      2023-01-04T13:05:15.179707701Z runtime.mapaccess1_faststr(0x7f39283dd878, 0x10, {0xc000894c40, 0xf})
      2023-01-04T13:05:15.179932520Z 	/usr/lib/golang/src/runtime/map_faststr.go:21 +0x3a5 fp=0xc00559d100 sp=0xc00559d098 pc=0x418ca5
      2023-01-04T13:05:15.180181245Z github.com/operator-framework/operator-lifecycle-manager/pkg/metrics.UpdateSubsSyncCounterStorage(0xc00545cfc0)       ~~~

       

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

      Slack discussion: https://redhat-internal.slack.com/archives/C3VS0LV41/p1673120541153639                            MG link - https://attachments.access.redhat.com/hydra/rest/cases/03396604/attachments/25f23643-2447-442b-ba26-4338b679b8cc?usePresignedUrl=true

       

            [OCPBUGS-5523] Catalog, fatal error: concurrent map read and map write

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2023:1326

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:1326

            bandrade@redhat.com rhn-support-jiazha the associated PR has merged and bandrade@redhat.com has marked the bug as verified, could we move this to the next stage?

             

            Alexander Greene (Inactive) added a comment - bandrade@redhat.com rhn-support-jiazha the associated PR has merged and bandrade@redhat.com has marked the bug as verified, could we move this to the next stage?  

            Marking as VERIFIED

            Catalog Operator is not getting restarted

            oc exec catalog-operator-9d959f49d-lw24j -n openshift-operator-lifecycle-manager – olm --version
            OLM version: 0.19.0
            git commit: 932e3a005ef1273a57af31baeb49aaf04fad1e22

            oc get pods -n openshift-marketplace                                                                                                                130 ↵
            NAME                                                              READY   STATUS      RESTARTS   AGE
            5ed5b8a25ac7f856a2b8d70cf2315c0bfe6a3f2b7bbe5b87fa150e13545gpcn   0/1     Completed   0          17m
            62cbb6ad41aa19f46278618e359bd8e9f5be16ef71f0d323e2c5ebea7cvvpmq   0/1     Completed   0          17m
            869950130143e25bd585467a97e2f7ce649dbad50447483af7bb490ee6k7zcn   0/1     Completed   0          17m
            certified-operators-kg5vs                                         1/1     Running     0          39m
            community-operators-h5fqz                                         1/1     Running     0          39m
            marketplace-operator-8547465866-cs94d                             1/1     Running     0          45m
            qe-app-registry-smmg9                                             1/1     Running     0          17m
            redhat-marketplace-vtnfx                                          1/1     Running     0          39m
            redhat-operators-ddpfm                                            1/1     Running     0          39m
            
            

            Marking as VERIFIED

            bruno andrade added a comment - Marking as VERIFIED Catalog Operator is not getting restarted oc exec catalog-operator-9d959f49d-lw24j -n openshift-operator-lifecycle-manager – olm --version OLM version: 0.19.0 git commit: 932e3a005ef1273a57af31baeb49aaf04fad1e22 oc get pods -n openshift-marketplace 130 ↵ NAME READY STATUS RESTARTS AGE 5ed5b8a25ac7f856a2b8d70cf2315c0bfe6a3f2b7bbe5b87fa150e13545gpcn 0/1 Completed 0 17m 62cbb6ad41aa19f46278618e359bd8e9f5be16ef71f0d323e2c5ebea7cvvpmq 0/1 Completed 0 17m 869950130143e25bd585467a97e2f7ce649dbad50447483af7bb490ee6k7zcn 0/1 Completed 0 17m certified-operators-kg5vs 1/1 Running 0 39m community-operators-h5fqz 1/1 Running 0 39m marketplace- operator -8547465866-cs94d 1/1 Running 0 45m qe-app-registry-smmg9 1/1 Running 0 17m redhat-marketplace-vtnfx 1/1 Running 0 39m redhat-operators-ddpfm 1/1 Running 0 39m Marking as VERIFIED

            Jian Zhang added a comment -

            Hi bandrade@redhat.com , you can check the latest payload commits info below:

            MacBook-Pro:~ jianzhang$ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.13.0-0.nightly-2023-01-29-185609 |grep olm
            Warning: the default reading order of registry auth file will be changed from "${HOME}/.docker/config.json" to podman registry config locations in the future version of oc. "${HOME}/.docker/config.json" is deprecated, but can still be used for storing credentials as a fallback. See https://github.com/containers/image/blob/main/docs/containers-auth.json.5.md for the order of podman registry config locations.
              olm-rukpak                                     https://github.com/openshift/operator-framework-rukpak                      f058b797747f009a6b33a5d42d4aa709fdd0c848
              operator-lifecycle-manager                     https://github.com/openshift/operator-framework-olm                         932e3a005ef1273a57af31baeb49aaf04fad1e22
              operator-registry                              https://github.com/openshift/operator-framework-olm                         932e3a005ef1273a57af31baeb49aaf04fad1e22 

            As you can see, the latest commits in https://github.com/openshift/operator-framework-olm/commits/master is 932e3a00, and it contains the fixed PR https://github.com/openshift/operator-framework-olm/pull/429, so this payload contains the fixed PR. 

            Please test it with a high priority, thanks!

            Jian Zhang added a comment - Hi bandrade@redhat.com , you can check the latest payload commits info below: MacBook-Pro:~ jianzhang$ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.13.0-0.nightly-2023-01-29-185609 |grep olm Warning: the default reading order of registry auth file will be changed from "${HOME}/.docker/config.json" to podman registry config locations in the future version of oc. "${HOME}/.docker/config.json" is deprecated, but can still be used for storing credentials as a fallback. See https: //github.com/containers/image/blob/main/docs/containers-auth.json.5.md for the order of podman registry config locations.   olm-rukpak                                     https: //github.com/openshift/ operator -framework-rukpak                      f058b797747f009a6b33a5d42d4aa709fdd0c848   operator -lifecycle-manager                     https: //github.com/openshift/ operator -framework-olm                         932e3a005ef1273a57af31baeb49aaf04fad1e22   operator -registry                              https: //github.com/openshift/ operator -framework-olm                         932e3a005ef1273a57af31baeb49aaf04fad1e22 As you can see, the latest commits in https://github.com/openshift/operator-framework-olm/commits/master is 932e3a00, and it contains the fixed PR https://github.com/openshift/operator-framework-olm/pull/429, so this payload contains the fixed PR.  Please test it with a high priority, thanks!

            Hi rh-ee-dfranz. Thanks for sharing the context. So please let us know when we have a payload with the fix, because the openshift-art-jira-bot will change the status to ON_QA since there's a PR associated with this issue and it's already merged.

            bruno andrade added a comment - Hi rh-ee-dfranz . Thanks for sharing the context. So please let us know when we have a payload with the fix, because the openshift-art-jira-bot will change the status to ON_QA since there's a PR associated with this issue and it's already merged.

            Hello bandrade@redhat.com,

            The proposed fix should not have been in that OCP version, the code change is not yet merged downstream. I'm putting the QA status back to POST until it can be verified with an OCP build.

            Just for clarity, I was unable to reproduce this myself using the specified OCP version and performing the actions reported by the customer. It may be necessary for us to spend more time to find a sure-fire way to reproduce the problem before we can conclusively show that the issue is gone. The issue may be random or heavily environment-dependent so this might be difficult.

            Daniel Franz added a comment - Hello bandrade@redhat.com , The proposed fix should not have been in that OCP version, the code change is not yet merged downstream. I'm putting the QA status back to POST until it can be verified with an OCP build. Just for clarity, I was unable to reproduce this myself using the specified OCP version and performing the actions reported by the customer. It may be necessary for us to spend more time to find a sure-fire way to reproduce the problem before we can conclusively show that the issue is gone. The issue may be random or heavily environment-dependent so this might be difficult.

            Marking as VERIFIED

            Catalog Operator is not getting restarted

            OCP version: 4.13.0-0.nightly-2023-01-17-152326
            OLM version: 0.19.0
            git commit: 2e2abf82e475789d6a3aa97ee7332647ae06ac8

            oc get pods -n openshift-operator-lifecycle-manager
            NAME READY STATUS RESTARTS AGE
            catalog-operator-bdb48b696-gvl55 1/1 Running 0 148m
            collect-profiles-27899835-pknb5 0/1 Completed 0 41m
            collect-profiles-27899850-4z4q8 0/1 Completed 0 26m
            collect-profiles-27899865-glv2f 0/1 Completed 0 11m
            olm-operator-6b984b87d4-fc2b6 1/1 Running 0 148m
            package-server-manager-57c9749769-wg9hz 1/1 Running 1 (136m ago) 148m
            packageserver-8598bd8599-6dz9c 1/1 Running 0 146m
            packageserver-8598bd8599-dnbr8 1/1 Running 0 146m

            bruno andrade added a comment - Marking as VERIFIED Catalog Operator is not getting restarted OCP version: 4.13.0-0.nightly-2023-01-17-152326 OLM version: 0.19.0 git commit: 2e2abf82e475789d6a3aa97ee7332647ae06ac8 oc get pods -n openshift-operator-lifecycle-manager NAME READY STATUS RESTARTS AGE catalog-operator-bdb48b696-gvl55 1/1 Running 0 148m collect-profiles-27899835-pknb5 0/1 Completed 0 41m collect-profiles-27899850-4z4q8 0/1 Completed 0 26m collect-profiles-27899865-glv2f 0/1 Completed 0 11m olm-operator-6b984b87d4-fc2b6 1/1 Running 0 148m package-server-manager-57c9749769-wg9hz 1/1 Running 1 (136m ago) 148m packageserver-8598bd8599-6dz9c 1/1 Running 0 146m packageserver-8598bd8599-dnbr8 1/1 Running 0 146m

            Currently waiting for the next nightly build to test it

            bruno andrade added a comment - Currently waiting for the next nightly build to test it

            Jian Zhang added a comment -

            Hi bandrade@redhat.com , could you help test this issue? Thanks!

             

            Jian Zhang added a comment - Hi bandrade@redhat.com , could you help test this issue? Thanks!  

              agreene1991 Alexander Greene (Inactive)
              rhn-support-bshaw Bikash Shaw
              bruno andrade bruno andrade
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: