Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-49860

OLMv1: Operator-controller fails to connect to catalogd

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • 4.18.0, 4.19.0
    • OLM
    • Critical
    • Yes
    • Flareon OLM Sprint 266
    • 1
    • Approved
    • False
    • Hide

      None

      Show
      None
    • Fixes an issue in OLMv1 that sometimes causes failures in establishing HTTPS connections due to improperly mounted CA certificates.
    • Bug Fix
    • In Progress

      Description of problem:

      error walking catalogs: error getting package "does-not-exist" from catalog "openshift-certified-operators": error retrieving cache for catalog "openshift-certified-operators": error performing request: Get "https://catalogd-service.openshift-catalogd.svc/catalogs/openshift-certified-operators/api/v1/all": tls: failed to verify certificate: x509: certificate signed by unknown authority    
      
      
      This is not specific to "openshift-certified-operators" or "does-not-exist". The root issue is: "tls: failed to verify certificate: x509: certificate signed by unknown authority" when connecting to catalogd.
      
      Some variants seem to fail more frequently than others on this (and similar) errors.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Not very reproducible. It seems to happen only 5% of the time overall.    

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      error walking catalogs: error getting package "does-not-exist" from catalog "openshift-certified-operators": error retrieving cache for catalog "openshift-certified-operators": error performing request: Get "https://catalogd-service.openshift-catalogd.svc/catalogs/openshift-certified-operators/api/v1/all": tls: failed to verify certificate: x509: certificate signed by unknown authority      

      Expected results:

       Connections from operator-controller to catalogd succeed   

      Additional info:

      This is causing component readiness failures. See https://sippy.dptools.openshift.org/sippy-ng/component_readiness/capabilities?baseEndTime=2024-10-01%2023%3A59%3A59&baseRelease=4.17&baseStartTime=2024-09-01%2000%3A00%3A00&columnGroupBy=Architecture%2CNetwork%2CPlatform%2CTopology&confidence=95&dbGroupBy=Platform%2CArchitecture%2CNetwork%2CTopology%2CFeatureSet%2CUpgrade%2CSuite%2CInstaller&flakeAsFailure=false&ignoreDisruption=true&ignoreMissing=false&includeMultiReleaseAnalysis=false&includeVariant=Architecture%3Aamd64&includeVariant=CGroupMode%3Av2&includeVariant=ContainerRuntime%3Arunc&includeVariant=ContainerRuntime%3Acrun&includeVariant=FeatureSet%3Adefault&includeVariant=Installer%3Aipi&includeVariant=Installer%3Aupi&includeVariant=Network%3Aovn&includeVariant=Owner%3Aeng&includeVariant=Platform%3Aaws&includeVariant=Platform%3Aazure&includeVariant=Platform%3Agcp&includeVariant=Platform%3Ametal&includeVariant=Platform%3Avsphere&includeVariant=Topology%3Aha&includeVariant=Topology%3Amicroshift&minFail=3&passRateAllTests=0&passRateNewTests=95&pity=5&sampleEndTime=2025-02-04%2023%3A59%3A59&sampleRelease=4.18&sampleStartTime=2025-01-28%2000%3A00%3A00&component=OLM    

              tshort@redhat.com Todd Short
              jlanford@redhat.com Joe Lanford
              Jian Zhang Jian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: