Uploaded image for project: 'OCMUI - OpenShift Cluster Manager UI'
  1. OCMUI - OpenShift Cluster Manager UI
  2. OCMUI-3043

[spike] investigate staging deployment first-run failures

    • Icon: Story Story
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • OCMUI Core Sprint 268

      issue

      openshift-SaaS deploy always fail on the first run, and has to be re-triggered manually.

      example notification from #ocm-ui-deploys:

      SaaS file uhc-portal deployment to environment insights-stage: Failure (Open). Reason: https://github.com/RedHatInsights/uhc-portal/commit/527aba78226ba35661a5a0f8e5bcb662f0043b38 triggered by openshift-saas-deploy-trigger-moving-commits

      discovery

      my guesswork

      failures are around missing image in quay.io. probably because the staging deploy is triggered by moving-commits from the master branch, and the jenkins build-job on master branch doesn't get to finish and push a container image to quay on time.

      it seems that the equivalent "production" deploy doesn't fail the same way, as it's triggered differently (via a manual revision ref; change), so the image is available.

      several suspects found in the source:

      • prod-frontends namespace includes operator-lifecycle-manager, but prod-staging doesn't. it might provide the missing trigger-dependency with quay builds.
      • stage-frontends namespace instantiates akamai_cache_bust_url with the value https://console.stage.redhat.com. this might need override, as our app is hosted at console.dev.redhat.com

      other's attempts / solutions

      the deployment "upstream" fix

      petra cihalova stated on slack that they're having a similar (potentially identical) issue, and got a fix for it (see this message).

      their fix appeared in this MR (but the source project no longer exists), which was merged to app-interface with this revision.

      this solution involved adding an upstream field to the offending (staging) deployment definition.

      another instance of the problem

      looks like justin sherrill also ran into this, and brought it up with adam drew (see this thread), but no solution was offered AFAICT.

      this instance surfaced while migrating to containerized builds, so may be less relevant (or similar to ours).

      conclusions

      so far, the deployment "upstream" fix seems most promising, and we should chase that first.

      if that doesn't work, we should probably reach out to other teams for further assistance.

              emalka@redhat.com Lyn M
              emalka@redhat.com Lyn M
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: