Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- CI/CD
- deployment
- no_qe

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
OCMUI Core Sprint 268

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

issue

openshift-SaaS deploy always fail on the first run, and has to be re-triggered manually.

example notification from #ocm-ui-deploys:

SaaS file uhc-portal deployment to environment insights-stage: Failure (Open). Reason: https://github.com/RedHatInsights/uhc-portal/commit/527aba78226ba35661a5a0f8e5bcb662f0043b38 triggered by openshift-saas-deploy-trigger-moving-commits

discovery

my guesswork

failures are around missing image in quay.io. probably because the staging deploy is triggered by moving-commits from the master branch, and the jenkins build-job on master branch doesn't get to finish and push a container image to quay on time.

it seems that the equivalent "production" deploy doesn't fail the same way, as it's triggered differently (via a manual revision ref; change), so the image is available.

several suspects found in the source:

prod-frontends namespace includes operator-lifecycle-manager, but prod-staging doesn't. it might provide the missing trigger-dependency with quay builds.
stage-frontends namespace instantiates akamai_cache_bust_url with the value https://console.stage.redhat.com. this might need override, as our app is hosted at console.dev.redhat.com

other's attempts / solutions

the deployment "upstream" fix

petra cihalova stated on slack that they're having a similar (potentially identical) issue, and got a fix for it (see this message).

their fix appeared in this MR (but the source project no longer exists), which was merged to app-interface with this revision.

this solution involved adding an upstream field to the offending (staging) deployment definition.

another instance of the problem

looks like justin sherrill also ran into this, and brought it up with adam drew (see this thread), but no solution was offered AFAICT.

this instance surfaced while migrating to containerized builds, so may be less relevant (or similar to ours).

conclusions

so far, the deployment "upstream" fix seems most promising, and we should chase that first.

if that doesn't work, we should probably reach out to other teams for further assistance.

is triggering

OCMUI-3214 staging deployment always fails on first-runs

Closed

Assignee:: Lyn M

Reporter:: Lyn M

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/02/10 11:14 AM

Updated:: 2025/03/17 10:13 PM

Resolved:: 2025/03/13 7:22 PM

Details

Description

issue

discovery

my guesswork

other's attempts / solutions

the deployment "upstream" fix

another instance of the problem

conclusions

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates