Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-459

Default catalogs fails liveness/readiness probes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • 4.11
    • OLM
    • None
    • Important
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:
      All default catalogsources in 4.11 are built using file-based catalogsouce. Those catalogsources fail to deploy successfully in 4.11 OCP cluster. Multiple CI runs on nightly build have failed due to this reason.

      The main culprit is the longer process time for YAML/JSON unmarshalling in the registry pod. The proposal to address this issue to add startupProbe to the registry pod. The startupProbe will check for grpc health before activating the liveness/readiness probe.
      Version-Release number of selected component (if applicable):

      4.11

      How reproducible:

      Steps to Reproduce:
      1. Delay an 4.11 OpenShift cluster
      2. Check registry pods for default catalogsources such as redhat-operators
      Actual results:
      The pods fail due to liveness/readiness probe failure: openshift-marketplace pod/redhat-operators-h22ms node/ci-op-s04xckx3-de73b-7fxs4-master-1 - reason/Unhealthy Readiness probe failed: timeout: failed to connect service ":50051" within 1s
      Expected results:
      The registry pods for default catalogsources should be up and running.
      Additional info:
      See Slack thread for more information:
      https://coreos.slack.com/archives/C01CQA76KMX/p1654190057669689

      Note: This bug is for backporting process. The 4.10.z BZ is https://bugzilla.redhat.com/show_bug.cgi?id=2115874

              vdinh@redhat.com Vu Dinh
              vdinh@redhat.com Vu Dinh
              Jian Zhang Jian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: