Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.19, 4.20
Component/s: ImageStreams
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.21
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem

The ImageStream controller lacks alerting for the ImportSuccess=False condition. The absence of this alerting could leave cluster admins and workload admins blind to the fact that Pods running from ImageStreamTags might be using older releases, even long after the ImageStream had been asked to import a newer pullspec. For example, after a ClusterVersion update from vA to vB, a Pod launched from the openshift/tools:latest ImageStreamTag could be running the vA tools image, even long after ClusterVersion claimed the update to vB had completed.

Version-Release number of selected component

Seen live in 4.19.0-rc.2. Reproduced in 4.20.0-0.nightly-2025-05-27-133818. Likely predates those releases, but I haven't checked.

How reproducible

Every time.

Steps to Reproduce

Install a cluster, e.g. with Cluster Bot launch 4.20 aws Create a working ImageStream, e.g. using the tools pullspec from the release image that the cluster is running:

$ oc new-project test-image-stream
$ oc import-image test --confirm --from "$(oc adm release info --image-for tools)"

Confirm the successful import:

$ oc get -o wide imagestreamtags test:latest
NAME          IMAGE REFERENCE                                                                                                          UPDATED              IMAGE NAME
test:latest   quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3902300abe2dcf3f6babfd819f78e9261dcbbbd6ac56d7172eceac493fd457ef   About a minute ago   sha256:3902300abe2dcf3f6babfd819f78e9261dcbbbd6ac56d7172eceac493fd457ef

Patch the ImageStream to request an unpullable image:

$ oc patch imagestream test --type json -p '[{"op": "add", "path": "/spec/tags/0/from/name", "value": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0000000000000000000000000000000000000000000000000000000000000000"}]'

Confirm that the import is failing:

$ oc get -o json imagestream test | jq .status
{
  "dockerImageRepository": "image-registry.openshift-image-registry.svc:5000/test-image-stream/test",
  "tags": [
    {
      "conditions": [
        {
          "generation": 3,
          "lastTransitionTime": "2025-05-28T17:57:28Z",
          "message": "dockerimage.image.openshift.io \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0000000000000000000000000000000000000000000000000000000000000000\" not found",
          "reason": "NotFound",
          "status": "False",
          "type": "ImportSuccess"
        }
      ],
      "items": [
        {
          "created": "2025-05-28T17:52:06Z",
          "dockerImageReference": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3902300abe2dcf3f6babfd819f78e9261dcbbbd6ac56d7172eceac493fd457ef",
          "generation": 1,
          "image": "sha256:3902300abe2dcf3f6babfd819f78e9261dcbbbd6ac56d7172eceac493fd457ef"
        }
      ],
      "tag": "latest"
    }
  ]
}

Check to see if there are any firing alerts to let the cluster admins know about this failing import:

$ TOKEN="$(oc -n openshift-monitoring create token prometheus-k8s)"  # the Cluster Bot / installer user has a cert-based kubeconfig, but alert retrieval requires a token.
$ oc -n openshift-monitoring get route thanos-querier
NAME             HOST/PORT                                                                             PATH   SERVICES         PORT   TERMINATION          WILDCARD
thanos-querier   thanos-querier-openshift-monitoring.apps.ci-ln-r183v0b-76ef8.aws-2.ci.openshift.org   /api   thanos-querier   web    reencrypt/Redirect   None
$ curl -s -k -H "Authorization: Bearer ${TOKEN}" https://thanos-querier-openshift-monitoring.apps.ci-ln-r183v0b-76ef8.aws-2.ci.openshift.org/api/v1/alerts | jq -r '.data.alerts[] | .state + " " + .labels.alertname' | sort

Actual results

Nothing about the failed import, even as a pending alert:

firing AlertmanagerReceiversNotConfigured
firing TargetDown
firing Watchdog

Expected results

Some kind of alert to draw the cluster admin or workload admin's attention to the failing import, so they could investigate and address the issue.

Additional info

Alternatively, in the absence of an ImageStream-controller-scoped alerting mechanism, the samples operator and/or CVO could grow more careful oversight of the ImportSuccess condition when reconciling these ImageStreams, to at least draw the admin's attention to import issues with those OCP-release-backed ImageStreams.

links to

openshift/cluster-openshift-apiserver-operator#627: OCPBUGS-56821: Add prometheus alert for image stream import error

Assignee:: Jubitta John

Reporter:: W. Trevor King

Need Info From:: None

Contributors:: None

QA Contact:: XiuJuan Wang

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/05/28 6:12 PM

Updated:: 2025/10/14 6:09 PM

Details

Description

Description of problem

Version-Release number of selected component

How reproducible

Steps to Reproduce

Actual results

Expected results

Additional info

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide