Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.20
Component/s: Cluster Version Operator
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Low
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem

The cluster-version operator can be slow to update its RetrievedUpdates conditions. For example, this tech-preview CI run failed on:

: [Serial][sig-cli] oc adm upgrade recommend When the update service has no recommendations runs successfully [Suite:openshift/conformance/serial]	19s
{  fail [github.com/openshift/origin/test/extended/cli/adm_upgrade/recommend.go:107]: Unexpected error:
    <*errors.errorString | 0xc007fc9920>: 
    expected:
      warning: Cannot refresh available updates:
        Reason: NoChannel
        Message: The update channel has not been configured.
      
      Upstream update service: http://172.30.47.137:8000/graph
      Channel: test-channel
      No updates available. You may still upgrade to a specific release image with --to-image or wait for new updates to be available.
    to match regular expression:
...

But simultaneously claiming Channel: test-channel and The update channel has not been configured doesn't make sense.

Version-Release number of selected component

Seen in 4.20 CI, but the test-case that's turning it up didn't exist in 4.19, so the behavior could be older.

How reproducible

Sippy shows that test-case succeeding over 99% of the time, so whatever is going on seems rare.

Steps to Reproduce

Set up a custom update service (~~OTA-520~~), but don't point ClusterVersion upstream at it yet.
Clear the cluster's channel with oc adm upgrade channel
Get an appropriate NoChannel reason in ClusterVersion's RetrievedUpdates conditions
Set the cluster's channel again with oc adm upgrade channel $ACTUAL_CHANNEL
Patch upstream to point at the custom update service from (1). This is likely the racy bit, and you'll probably need to land this patch within milliseconds of the channel bump in order to trigger this issue.
Give the cluster at least 16s to form opinions about the new channel
Check ClusterVersion's RetrievedUpdates condition again

For (2) and (6), you can use:

$ oc get -o jsonpath='{.status.conditions[?(.type=="RetrievedUpdates")]}{"\n"}' clusterversion version

Actual results

{"lastTransitionTime":"...","message":"The update channel has not been configured","reason":"NoChannel","status":"False","type":"RetrievedUpdates"}

Expected results

{"lastTransitionTime":"...","status":"True","type":"RetrievedUpdates"}

Additional info

From test-case stdout in the job I opened with:

I0818 01:20:42.406557 52322 client.go:1022] Running 'oc --namespace=e2e-oc-adm-upgrade-recommend-2867 --kubeconfig=/tmp/kubeconfig-2727234894 adm upgrade channel test-channel'
warning: No channels known to be compatible with the current version "4.20.0-0.nightly-2025-08-17-232035"; unable to validate "test-channel". Setting the update channel to "test-channel" anyway.
I0818 01:20:42.536347 52322 client.go:1022] Running 'oc --namespace=e2e-oc-adm-upgrade-recommend-2867 --kubeconfig=/tmp/kubeconfig-2727234894 patch clusterversions.config.openshift.io version --type json -p [{"op": "add", "path": "/spec/upstream", "value": "http://172.30.47.137:8000/graph"}]'
clusterversion.config.openshift.io/version patched
I0818 01:20:58.722682 52322 client.go:1022] Running 'oc --namespace=e2e-oc-adm-upgrade-recommend-2867 --kubeconfig=/tmp/kubeconfig-2727234894 adm upgrade recommend'
  [FAILED] in [It] - github.com/openshift/origin/test/extended/cli/adm_upgrade/recommend.go:107 @ 08/18/25 01:20:58.857
I0818 01:20:58.858116 52322 client.go:1022] Running 'oc --namespace=e2e-oc-adm-upgrade-recommend-2867 --kubeconfig=/tmp/kubeconfig-2727234894 adm upgrade channel '
warning: Clearing channel "test-channel"; cluster will no longer request available update recommendations.

So on the test-suite side, the timeline is:

1:20:42.406, set channel to test-channel.
1:20:42.536, set upstream to point to a local Pod serving a dummy update service.
Waited 16s for the CVO to process those changes.
1:20:58.722, ran recommend and saw ClusterVersion still complaining about NoChannel.

During that time, [CVO logshttps://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-vsphere-ovn-techpreview-serial/1957221994252472320/artifacts/e2e-vsphere-ovn-techpreview-serial/gather-extra/artifacts/pods/] have:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-vsphere-ovn-techpreview-serial/1957221994252472320/artifacts/e2e-vsphere-ovn-techpreview-serial/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-86b5f6885b-mzm6l_cluster-version-operator.log | grep '0818 01:2[01]:.*\(cincinnati\|availableupdates\)'
I0818 01:20:18.894857       1 availableupdates.go:98] Available updates were recently retrieved, with less than 3m42.992944812s elapsed since 2025-08-18T01:16:36Z, will try later.
I0818 01:20:42.526149       1 availableupdates.go:77] Retrieving available updates again, because the channel has changed from "" to "test-channel"
I0818 01:20:42.529936       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=test-channel&id=1b8e4fd0-ab6d-4e19-8393-ec99ea639b0e&version=4.20.0-0.nightly-2025-08-17-232035
I0818 01:21:12.805171       1 availableupdates.go:398] Update service https://api.openshift.com/api/upgrades_info/v1/graph could not return available updates: VersionNotFound: currently reconciling cluster version 4.20.0-0.nightly-2025-08-17-232035 not found in the "test-channel" channel
I0818 01:21:12.805240       1 availableupdates.go:77] Retrieving available updates again, because the channel has changed from "test-channel" to ""
I0818 01:21:12.819094       1 availableupdates.go:98] Available updates were recently retrieved, with less than 3m42.992944812s elapsed since 2025-08-18T01:21:12Z, will try later.

So there's a 1:20:42.529 test-channel retrieval attempt, but it's using the default api.openshift.com upstream, and not our custom local Pod. And there doesn't seem to be a retry when the local Pod's upstream is set.

Way out at 01:32, I do see the CVO triggering a new fetch on an upstream change, although it's a different IP address for a different test-case:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-vsphere-ovn-techpreview-serial/1957221994252472320/artifacts/e2e-vsphere-ovn-techpreview-serial/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-86b5f6885b-mzm6l_cluster-version-operator.log | grep upstream
I0818 01:32:03.778657       1 availableupdates.go:103] Retrieving available updates again, because the update service has changed from "" to "http://172.30.151.226:8000/graph" from ClusterVersion spec.upstream

The bug here is why this test-case run failed to trigger a retrieval after the upstream bump; likely some kind of race between upstream-change-detection and they channel-bump-induced retrieval attempt.

Assignee:: W. Trevor King

Reporter:: W. Trevor King

Need Info From:: None

Contributors:: None

QA Contact:: Jia Liu

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/09/04 7:32 PM

Updated:: 2025/09/04 7:35 PM

Details

Description

Description of problem

Version-Release number of selected component

How reproducible

Steps to Reproduce

Attachments

Easy Agile Planning Poker

Activity

People

Dates