[OCPBUGS-52848] samples operator failing installs frequently on gcp

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.19.0
Component/s: Cloud Compute / Cluster API Providers
Labels:
- component-regression
- trt

Severity:
Important
Regression:
Yes
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.19.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

(Feel free to update this bug's summary to be more specific.)
Component Readiness has found a regression in the following test:

install should succeed: overall

Extreme regression detected.
Fishers Exact probability of a regression: 100.00%.
Test pass rate dropped from 98.88% to 57.89%.
Overrode base stats using release 4.17

Sample (being evaluated) Release: 4.19
Start Time: 2025-03-03T00:00:00Z
End Time: 2025-03-10T08:00:00Z
Success Rate: 57.89%
Successes: 22
Failures: 16
Flakes: 0

Base (historical) Release: 4.17
Start Time: 2024-09-01T00:00:00Z
End Time: 2024-10-01T00:00:00Z
Success Rate: 98.88%
Successes: 88
Failures: 1
Flakes: 0

View the test details report for additional context.

gcp installs seem to be failing frequently with the error:

These cluster operators were not stable: [openshift-samples]

From: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.19-e2e-gcp-ovn-techpreview/1898814955482779648

The samples operator reports:

status:
  conditions:
    - lastTransitionTime: "2025-03-09T19:56:05Z"
      status: "False"
      type: Degraded
    - lastTransitionTime: "2025-03-09T19:56:17Z"
      message: Samples installation successful at 4.19.0-0.nightly-2025-03-09-190956
      status: "True"
      type: Available
    - lastTransitionTime: "2025-03-09T20:43:02Z"
      message: "Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: java,kube-root-ca.crt,openshift-service-ca.crt,nodejs; last import attempt 2025-03-09 19:57:39 +0000 UTC"
      reason: FailedImageImports
      status: "False"
      type: Progressing

I'm confused how this is failing install given available=true and degraded=false, and yet there does appear to be a problem reported in the message. It is possible this artifact was collected a few minutes after the install failed, is it possible the operator stabilizes (ignores these errors) in that time? Note that not all installs are failing this way, but a good chunk.

Problem appears limited to 4.19 gcp, I do see one hit for vsphere though.

https://search.dptools.openshift.org/?search=These+cluster+operators+were+not+stable%3A.*openshift-samples&maxAge=48h&context=1&type=build-log&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

account is impacted by

TRT-2048 GCP techpreview jobs failed install with sample operators

Closed

is duplicated by

OCPBUGS-53166 Component Readiness: [Installer / openshift-installer] [Other] test regressed: 4.19-e2e-gcp-ovn-techpreview tests often got openshift-samples CO unstable

Closed

relates to

OCPBUGS-53267 Samples Operator capable of spamming updates to etcd

Release Pending

links to

openshift/cluster-samples-operator#606: OCPBUGS-52848: Revert "OCPBUGS-52346: bump x/oauth2 to version 0.27.0"

openshift/cluster-samples-operator#607: OCPBUGS-52848: Revert "OCPBUGS-45049: Adding mutex to func createSamples on handler.go"

openshift/cluster-samples-operator#608: OCPBUGS-52848: Unrevert the revert "OCPBUGS-52346: bump x/oauth2 to version 0.27.0"

openshift/cluster-samples-operator#609: OCPBUGS-52848: Unrevert the revert "OCPBUGS-45049: Adding mutex to func createSamples on handler.go"

RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update

(3 links to)

Devan Goodwin added a comment - 2025/04/02 1:34 PM

This regression is cleared. Moving verified.

Devan Goodwin added a comment - 2025/04/02 1:34 PM This regression is cleared. Moving verified.

Devan Goodwin added a comment - 2025/03/18 6:03 PM

I have confirmed the sync loop theory. The samples operator, while riddled with bugs, is not what changed here, those bugs are being exposed because it has an overly broad watch on ANY ClusterOperator update in the entire cluster, not just it’s own. Any update will trigger the samples operator sync loop, which has bugs where it writes to etcd when nothing has actually changed.

Our suspected payload has a change to the cluster-api ClusterOperator updating, which would only take effect on techpreview clusters.

I checked the resource watch tarball for a 4.19 job after this change and examined the git repo history for the cluster-api operator and found over 500 updates. A 4.18 comparison has 8.

capi operator appers to be doing lots of unnecessary updates:

diff --git a/cluster-scoped-resources/config.openshift.io/clusteroperators/cluster-api.yaml b/cluster-scoped-resources/config.openshift.io/clusteroperators/cluster-api.yaml
index e1e94c335..49e42d807 100644
--- a/cluster-scoped-resources/config.openshift.io/clusteroperators/cluster-api.yaml
+++ b/cluster-scoped-resources/config.openshift.io/clusteroperators/cluster-api.yaml
@@ -103,14 +103,14 @@ metadata:
     fieldsV1:
       f:status:
         f:conditions:
-          k:{"type":"CapiInstallerControllerAvailable"}:
+          k:{"type":"InfraClusterControllerAvailable"}:
             .: {}
             f:lastTransitionTime: {}
             f:message: {}
             f:reason: {}
             f:status: {}
             f:type: {}
-          k:{"type":"CapiInstallerControllerDegraded"}:
+          k:{"type":"InfraClusterControllerDegraded"}:
             .: {}
             f:lastTransitionTime: {}
             f:message: {}
@@ -119,7 +119,7 @@ metadata:
             f:type: {}
         f:relatedObjects: {}
         f:versions: {}
-    manager: CapiInstallerController
+    manager: InfraClusterController
     operation: Apply
     subresource: status
     time: "2025-03-09T20:05:07Z"
@@ -128,14 +128,14 @@ metadata:
     fieldsV1:
       f:status:
         f:conditions:
-          k:{"type":"InfraClusterControllerAvailable"}:
+          k:{"type":"CapiInstallerControllerAvailable"}:
             .: {}
             f:lastTransitionTime: {}
             f:message: {}
             f:reason: {}
             f:status: {}
             f:type: {}
-          k:{"type":"InfraClusterControllerDegraded"}:
+          k:{"type":"CapiInstallerControllerDegraded"}:
             .: {}
             f:lastTransitionTime: {}
             f:message: {}
@@ -144,10 +144,10 @@ metadata:
             f:type: {}
         f:relatedObjects: {}
         f:versions: {}
-    manager: InfraClusterController
+    manager: CapiInstallerController
     operation: Apply
     subresource: status
-    time: "2025-03-09T20:05:07Z"
+    time: "2025-03-09T20:05:09Z"
   - apiVersion: config.openshift.io/v1
     fieldsType: FieldsV1
     fieldsV1:
@@ -182,7 +182,7 @@ metadata:
     kind: ClusterVersion
     name: version
     uid: 32b7e068-2029-46ba-8ef3-6582b2b9069c
-  resourceVersion: "34767"
+  resourceVersion: "34830"
   uid: e1b62a8b-0b94-4382-9a00-5f8128365571
 spec: {}
 status:
@@ -217,12 +217,12 @@ status:
     reason: AsExpected
     status: "False"
     type: SecretSyncControllerDegraded
-  - lastTransitionTime: "2025-03-09T20:05:07Z"
+  - lastTransitionTime: "2025-03-09T20:05:09Z"
     message: CAPI Installer Controller works as expected
     reason: AsExpected
     status: "True"
     type: CapiInstallerControllerAvailable
-  - lastTransitionTime: "2025-03-09T20:05:07Z"
+  - lastTransitionTime: "2025-03-09T20:05:09Z"
     message: CAPI Installer Controller works as expected
     reason: AsExpected
     status: “False"

Likely as a result of this PR in the precise payload we targeted as the start point originally: https://github.com/openshift/cluster-capi-operator/pull/273

The samples operator undoubtedly has multiple long standing issues some of which should be fixed regardless of it’s maintenance deprecated status because they’re hitting the control plane etcd unnecessarily hard, and the fixes would likely be fairly simple. I will file a new bug for these. (See linked issues)

But this bug goes to capi and a revert is in progress, the fix looks simple and is identified in the slack thread by Joel.

Devan Goodwin added a comment - 2025/03/18 6:03 PM I have confirmed the sync loop theory. The samples operator, while riddled with bugs, is not what changed here, those bugs are being exposed because it has an overly broad watch on ANY ClusterOperator update in the entire cluster, not just it’s own. Any update will trigger the samples operator sync loop, which has bugs where it writes to etcd when nothing has actually changed. Our suspected payload has a change to the cluster-api ClusterOperator updating, which would only take effect on techpreview clusters. I checked the resource watch tarball for a 4.19 job after this change and examined the git repo history for the cluster-api operator and found over 500 updates. A 4.18 comparison has 8. capi operator appers to be doing lots of unnecessary updates: diff --git a/cluster-scoped-resources/config.openshift.io/clusteroperators/cluster-api.yaml b/cluster-scoped-resources/config.openshift.io/clusteroperators/cluster-api.yaml index e1e94c335..49e42d807 100644 --- a/cluster-scoped-resources/config.openshift.io/clusteroperators/cluster-api.yaml +++ b/cluster-scoped-resources/config.openshift.io/clusteroperators/cluster-api.yaml @@ -103,14 +103,14 @@ metadata: fieldsV1: f:status: f:conditions: - k:{ "type" : "CapiInstallerControllerAvailable" }: + k:{ "type" : "InfraClusterControllerAvailable" }: .: {} f:lastTransitionTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} - k:{ "type" : "CapiInstallerControllerDegraded" }: + k:{ "type" : "InfraClusterControllerDegraded" }: .: {} f:lastTransitionTime: {} f:message: {} @@ -119,7 +119,7 @@ metadata: f:type: {} f:relatedObjects: {} f:versions: {} - manager: CapiInstallerController + manager: InfraClusterController operation: Apply subresource: status time: "2025-03-09T20:05:07Z" @@ -128,14 +128,14 @@ metadata: fieldsV1: f:status: f:conditions: - k:{ "type" : "InfraClusterControllerAvailable" }: + k:{ "type" : "CapiInstallerControllerAvailable" }: .: {} f:lastTransitionTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} - k:{ "type" : "InfraClusterControllerDegraded" }: + k:{ "type" : "CapiInstallerControllerDegraded" }: .: {} f:lastTransitionTime: {} f:message: {} @@ -144,10 +144,10 @@ metadata: f:type: {} f:relatedObjects: {} f:versions: {} - manager: InfraClusterController + manager: CapiInstallerController operation: Apply subresource: status - time: "2025-03-09T20:05:07Z" + time: "2025-03-09T20:05:09Z" - apiVersion: config.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: @@ -182,7 +182,7 @@ metadata: kind: ClusterVersion name: version uid: 32b7e068-2029-46ba-8ef3-6582b2b9069c - resourceVersion: "34767" + resourceVersion: "34830" uid: e1b62a8b-0b94-4382-9a00-5f8128365571 spec: {} status: @@ -217,12 +217,12 @@ status: reason: AsExpected status: "False" type: SecretSyncControllerDegraded - - lastTransitionTime: "2025-03-09T20:05:07Z" + - lastTransitionTime: "2025-03-09T20:05:09Z" message: CAPI Installer Controller works as expected reason: AsExpected status: "True" type: CapiInstallerControllerAvailable - - lastTransitionTime: "2025-03-09T20:05:07Z" + - lastTransitionTime: "2025-03-09T20:05:09Z" message: CAPI Installer Controller works as expected reason: AsExpected status: “False" Likely as a result of this PR in the precise payload we targeted as the start point originally: https://github.com/openshift/cluster-capi-operator/pull/273 The samples operator undoubtedly has multiple long standing issues some of which should be fixed regardless of it’s maintenance deprecated status because they’re hitting the control plane etcd unnecessarily hard, and the fixes would likely be fairly simple. I will file a new bug for these. (See linked issues) But this bug goes to capi and a revert is in progress, the fix looks simple and is identified in the slack thread by Joel.

Devan Goodwin added a comment - 2025/03/18 2:50 PM - edited

I had hoped to provide a patch to fix but it looks more complicated than I had hoped so I will have to let the team handle.

Firstly, your operator should not be updating LastTransitionTime on the Progressing condition unless the condition actually transitions it's state from false to true or vice versa. I suspect this originates here: https://github.com/openshift/cluster-samples-operator/blob/9ae232f60d73cfa81cfe3155fed3f5bbdff3bfcf/pkg/util/util.go#L96

Then looking at the update code: https://github.com/openshift/cluster-samples-operator/blob/be1110623ee84a371f739bb91dea87dfb853aa07/pkg/operatorstatus/operatorstatus.go#L239 this looks like it's intended to preserve the old condition unless status/message/reason has changed, but somehow that seems to not be working?

And the overall Progressing condition is set here: https://github.com/openshift/cluster-samples-operator/blob/9ae232f60d73cfa81cfe3155fed3f5bbdff3bfcf/pkg/util/util.go#L244 and at the very least here, you should sort the activeStreams so you get a predictable message.

I can't tell exactly what's going on, and I especially cannot tell why it's only techpreview jobs, but the fact remains if you fix this operator behaviour, and Progressing LastTransitionTime stops getting updated for no reason, installs will likely succeed. It feels to me like your update loop was always bugged, but for some reason is not constantly getting called and I can't tell why.

You may find answers in the question why does 4.19 constantly log:

time="2025-03-09T20:45:50Z" level=info msg="no global imagestream configuration will block imagestream creation using "
time="2025-03-09T20:45:50Z" level=info msg="At steady state: config the same and exists is true, in progress false, and version correct"
time="2025-03-09T20:45:50Z" level=info msg="no global imagestream configuration will block imagestream creation using "
time="2025-03-09T20:45:50Z" level=info msg="At steady state: config the same and exists is true, in progress false, and version correct"
time="2025-03-09T20:45:50Z" level=info msg="no global imagestream configuration will block imagestream creation using "
time="2025-03-09T20:45:50Z" level=info msg="At steady state: config the same and exists is true, in progress false, and version correct"
time="2025-03-09T20:45:50Z" level=info msg="no global imagestream configuration will block imagestream creation using "
time="2025-03-09T20:45:50Z" level=info msg="At steady state: config the same and exists is true, in progress false, and version correct"

Whereas the 4.18 log looks very different. This kind of feels like 4.19 is in some kind of syncloop.

Checking the resourcewatch observer artifacts which has a git repo containing changes for each update to the ClusterOperator yaml, we see over 2k updates in a 4.19 job, and only about 500 in a 4.18.

In the resourcewatch tar we also see that among the many updates to the ClusterOperator there are two patterns, one where just the reason changes:

diff --git a/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml b/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml
index b31a6b908..8f52667a8 100644
--- a/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml
+++ b/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml
@@ -71,7 +71,7 @@ metadata:
     kind: ClusterVersion
     name: version
     uid: 32b7e068-2029-46ba-8ef3-6582b2b9069c
-  resourceVersion: "34832"
+  resourceVersion: "34834"
   uid: c31b35bb-b007-4466-b23d-4f8486075c05
 spec: {}
 status:
@@ -85,7 +85,7 @@ status:
     type: Available
   - lastTransitionTime: "2025-03-09T20:05:09Z"
     message: 'Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image
-      import failures for these imagestreams: openshift-service-ca.crt,nodejs,httpd,java,kube-root-ca.crt,mariadb,php,python;
+      import failures for these imagestreams: python,openshift-service-ca.crt,nodejs,httpd,java,kube-root-ca.crt,mariadb,php;
       last import attempt 2025-03-09 19:57:39 +0000 UTC'
     reason: FailedImageImports
     status: "False"

But these ones do not bump the lastTransitionTime. That happens always in conjunction with an update to the managed fields:

diff --git a/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml b/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml
index d919b50a4..9bca6ee6a 100644
--- a/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml
+++ b/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml
@@ -63,7 +63,7 @@ metadata:
     manager: cluster-samples-operator
     operation: Update
     subresource: status
-    time: "2025-03-09T20:05:09Z"
+    time: "2025-03-09T20:05:10Z"
   name: openshift-samples
   ownerReferences:
   - apiVersion: config.openshift.io/v1
@@ -71,7 +71,7 @@ metadata:
     kind: ClusterVersion
     name: version
     uid: 32b7e068-2029-46ba-8ef3-6582b2b9069c
-  resourceVersion: "34848"
+  resourceVersion: "34849"
   uid: c31b35bb-b007-4466-b23d-4f8486075c05
 spec: {}
 status:
@@ -83,9 +83,9 @@ status:
     message: Samples installation successful at 4.19.0-0.nightly-2025-03-09-190956
     status: "True"
     type: Available
-  - lastTransitionTime: "2025-03-09T20:05:09Z"
+  - lastTransitionTime: "2025-03-09T20:05:10Z"
     message: 'Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image
-      import failures for these imagestreams: php,python,nodejs,openshift-service-ca.crt,httpd,java,kube-root-ca.crt,mariadb;
+      import failures for these imagestreams: openshift-service-ca.crt,nodejs,httpd,java,kube-root-ca.crt,mariadb,php,python;
       last import attempt 2025-03-09 19:57:39 +0000 UTC'
     reason: FailedImageImports
     status: "False"

We are unclear on what is triggering this.

If you are stuck, I would suggest debug logging to get a feel for how your operator's sync loop is getting called and flowing through these various linked methods above. Then we dig into more runs.

Devan Goodwin added a comment - 2025/03/18 2:50 PM - edited I had hoped to provide a patch to fix but it looks more complicated than I had hoped so I will have to let the team handle. Firstly, your operator should not be updating LastTransitionTime on the Progressing condition unless the condition actually transitions it's state from false to true or vice versa. I suspect this originates here: https://github.com/openshift/cluster-samples-operator/blob/9ae232f60d73cfa81cfe3155fed3f5bbdff3bfcf/pkg/util/util.go#L96 Then looking at the update code: https://github.com/openshift/cluster-samples-operator/blob/be1110623ee84a371f739bb91dea87dfb853aa07/pkg/operatorstatus/operatorstatus.go#L239 this looks like it's intended to preserve the old condition unless status/message/reason has changed, but somehow that seems to not be working? And the overall Progressing condition is set here: https://github.com/openshift/cluster-samples-operator/blob/9ae232f60d73cfa81cfe3155fed3f5bbdff3bfcf/pkg/util/util.go#L244 and at the very least here, you should sort the activeStreams so you get a predictable message. I can't tell exactly what's going on, and I especially cannot tell why it's only techpreview jobs, but the fact remains if you fix this operator behaviour, and Progressing LastTransitionTime stops getting updated for no reason, installs will likely succeed. It feels to me like your update loop was always bugged, but for some reason is not constantly getting called and I can't tell why. You may find answers in the question why does 4.19 constantly log : time= "2025-03-09T20:45:50Z" level=info msg= "no global imagestream configuration will block imagestream creation using " time= "2025-03-09T20:45:50Z" level=info msg= "At steady state: config the same and exists is true , in progress false , and version correct" time= "2025-03-09T20:45:50Z" level=info msg= "no global imagestream configuration will block imagestream creation using " time= "2025-03-09T20:45:50Z" level=info msg= "At steady state: config the same and exists is true , in progress false , and version correct" time= "2025-03-09T20:45:50Z" level=info msg= "no global imagestream configuration will block imagestream creation using " time= "2025-03-09T20:45:50Z" level=info msg= "At steady state: config the same and exists is true , in progress false , and version correct" time= "2025-03-09T20:45:50Z" level=info msg= "no global imagestream configuration will block imagestream creation using " time= "2025-03-09T20:45:50Z" level=info msg= "At steady state: config the same and exists is true , in progress false , and version correct" Whereas the 4.18 log looks very different . This kind of feels like 4.19 is in some kind of syncloop. Checking the resourcewatch observer artifacts which has a git repo containing changes for each update to the ClusterOperator yaml, we see over 2k updates in a 4.19 job, and only about 500 in a 4.18. In the resourcewatch tar we also see that among the many updates to the ClusterOperator there are two patterns, one where just the reason changes: diff --git a/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml b/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml index b31a6b908..8f52667a8 100644 --- a/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml +++ b/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml @@ -71,7 +71,7 @@ metadata: kind: ClusterVersion name: version uid: 32b7e068-2029-46ba-8ef3-6582b2b9069c - resourceVersion: "34832" + resourceVersion: "34834" uid: c31b35bb-b007-4466-b23d-4f8486075c05 spec: {} status: @@ -85,7 +85,7 @@ status: type: Available - lastTransitionTime: "2025-03-09T20:05:09Z" message: 'Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image - import failures for these imagestreams: openshift-service-ca.crt,nodejs,httpd,java,kube-root-ca.crt,mariadb,php,python; + import failures for these imagestreams: python,openshift-service-ca.crt,nodejs,httpd,java,kube-root-ca.crt,mariadb,php; last import attempt 2025-03-09 19:57:39 +0000 UTC' reason: FailedImageImports status: "False" But these ones do not bump the lastTransitionTime. That happens always in conjunction with an update to the managed fields: diff --git a/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml b/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml index d919b50a4..9bca6ee6a 100644 --- a/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml +++ b/cluster-scoped-resources/config.openshift.io/clusteroperators/openshift-samples.yaml @@ -63,7 +63,7 @@ metadata: manager: cluster-samples- operator operation: Update subresource: status - time: "2025-03-09T20:05:09Z" + time: "2025-03-09T20:05:10Z" name: openshift-samples ownerReferences: - apiVersion: config.openshift.io/v1 @@ -71,7 +71,7 @@ metadata: kind: ClusterVersion name: version uid: 32b7e068-2029-46ba-8ef3-6582b2b9069c - resourceVersion: "34848" + resourceVersion: "34849" uid: c31b35bb-b007-4466-b23d-4f8486075c05 spec: {} status: @@ -83,9 +83,9 @@ status: message: Samples installation successful at 4.19.0-0.nightly-2025-03-09-190956 status: "True" type: Available - - lastTransitionTime: "2025-03-09T20:05:09Z" + - lastTransitionTime: "2025-03-09T20:05:10Z" message: 'Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image - import failures for these imagestreams: php,python,nodejs,openshift-service-ca.crt,httpd,java,kube-root-ca.crt,mariadb; + import failures for these imagestreams: openshift-service-ca.crt,nodejs,httpd,java,kube-root-ca.crt,mariadb,php,python; last import attempt 2025-03-09 19:57:39 +0000 UTC' reason: FailedImageImports status: "False" We are unclear on what is triggering this. If you are stuck, I would suggest debug logging to get a feel for how your operator's sync loop is getting called and flowing through these various linked methods above. Then we dig into more runs.

Devan Goodwin added a comment - 2025/03/17 5:18 PM

I may have found something important.

It didn't make sense that we're failing an install for an operator that is progressing=false available=true degraded/=false, so I went looking for what code is failing this install with the message "level=error msg=These cluster operators were not stable: [openshift-samples]" and found this module in the installer, linking directly to the specific line of code that returns ok or not: https://github.com/openshift/installer/blob/74c0c30aa1e7926302b07564480a8eb2f605be20/cmd/openshift-install/create.go#L868

func meetsStabilityThreshold(progressing *configv1.ClusterOperatorStatusCondition) bool {
	return progressing.Status == configv1.ConditionFalse && time.Since(progressing.LastTransitionTime.Time).Seconds() > coStabilityThreshold
}

Looking at the installer debug logging for the 4.19 run in the original description of this bug we see this:

time="2025-03-09T20:39:20Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:18 +0000 UTC DurationSinceTransition=3s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC"
time="2025-03-09T20:39:21Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:18 +0000 UTC DurationSinceTransition=4s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC"
time="2025-03-09T20:39:22Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:21 +0000 UTC DurationSinceTransition=2s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC"
time="2025-03-09T20:39:23Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:21 +0000 UTC DurationSinceTransition=3s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC"
time="2025-03-09T20:39:24Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:23 +0000 UTC DurationSinceTransition=2s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC"
time="2025-03-09T20:39:25Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:25 +0000 UTC DurationSinceTransition=1s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC"
time="2025-03-09T20:39:26Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:25 +0000 UTC DurationSinceTransition=2s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC"
time="2025-03-09T20:39:27Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:25 +0000 UTC DurationSinceTransition=3s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC"
time="2025-03-09T20:39:28Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:25 +0000 UTC DurationSinceTransition=4s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC"
time="2025-03-09T20:39:29Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:25 +0000 UTC DurationSinceTransition=5s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC"
time="2025-03-09T20:39:30Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:30 +0000 UTC DurationSinceTransition=1s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: java,kube-root-ca.crt,openshift-service-ca.crt,nodejs; last import attempt 2025-03-09 19:57:39 +0000 UTC"

Note that the last transition time is changing frequently despite the operator not changing state, I believe lastTransitionTimestamp should only change if the actual Progressing state changes, but regardless it definitely shouldn't be changing if the reason has changed. If you look closer, you'll also notice the reason is changing slightly in that the list of images is jumping around in ordering, it's seemingly not sorted.

Now I'm wondering what it's like in 4.18 where our problem is not happening, and the first run I open has the following in it's installer log:

time="2025-03-14T20:47:34Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=15s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC"
time="2025-03-14T20:47:35Z" level=debug msg="Cluster Operator kube-apiserver is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=16s Reason=AsExpected Message=NodeInstallerProgressing: 3 nodes are at revision 8"
time="2025-03-14T20:47:35Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=16s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC"
time="2025-03-14T20:47:36Z" level=debug msg="Cluster Operator kube-apiserver is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=17s Reason=AsExpected Message=NodeInstallerProgressing: 3 nodes are at revision 8"
time="2025-03-14T20:47:36Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=17s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC"
time="2025-03-14T20:47:37Z" level=debug msg="Cluster Operator kube-apiserver is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=18s Reason=AsExpected Message=NodeInstallerProgressing: 3 nodes are at revision 8"
time="2025-03-14T20:47:37Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=18s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC"
time="2025-03-14T20:47:38Z" level=debug msg="Cluster Operator kube-apiserver is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=19s Reason=AsExpected Message=NodeInstallerProgressing: 3 nodes are at revision 8"
time="2025-03-14T20:47:38Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=19s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC"
time="2025-03-14T20:47:39Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=20s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC"
time="2025-03-14T20:47:39Z" level=debug msg="Cluster Operator kube-apiserver is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=20s Reason=AsExpected Message=NodeInstallerProgressing: 3 nodes are at revision 8"
time="2025-03-14T20:47:40Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=21s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC"

Here the lastTransitionTime is NOT changing as much, it does change sometimes but it looks like it's when the underlying condition reason changes, but in 4.19 the lastTransitionTime seems to change regardless if the reason changed.

Not to mention, here is our symptom present in 4.18, but NOT failing the installs because lastTransitionTime is not updating.

So now what is happening on non-techpreview gcp jobs in 4.19?

time="2025-03-14T23:44:45Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=12s Reason=Fa
iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runtime-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC"                                                                                                                  time="2025-03-14T23:44:46Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=13s Reason=Fa
iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runtime-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC"                                                                                                                  time="2025-03-14T23:44:47Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=14s Reason=Fa
iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runtime-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC"                                                                                                                  time="2025-03-14T23:44:48Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=15s Reason=Fa
iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runtime-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC"                                                                                                                  time="2025-03-14T23:44:49Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=16s Reason=Fa
iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runtime-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC"                                                                                                                  time="2025-03-14T23:44:50Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=17s Reason=Fa
iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runtime-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC"
time="2025-03-14T23:44:51Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=18s Reason=Fa
iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runti
me-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,
mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC"
time="2025-03-14T23:44:52Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=19s Reason=Fa
iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runti
me-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,
mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC"
time="2025-03-14T23:44:53Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=20s Reason=Fa
iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runti
me-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,
mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC"

Conclusion: we're not chasing an image pull problem, the image pull problem is everywhere and likely very common. The actual difference in behaviour seems to be that the operator is now updating it's lastTransitionTime every sync loop perhaps because the ordering of the image streams is changing. Because of this, when the registry pulls are having problems, the installer thinks the Progressing condition is not stable. Somehow, this is happening in 4.19 techpreview, and seemingly not in regular non-techpreview gcp jobs.

Devan Goodwin added a comment - 2025/03/17 5:18 PM I may have found something important. It didn't make sense that we're failing an install for an operator that is progressing=false available=true degraded/=false, so I went looking for what code is failing this install with the message "level=error msg=These cluster operators were not stable: [openshift-samples] " and found this module in the installer, linking directly to the specific line of code that returns ok or not: https://github.com/openshift/installer/blob/74c0c30aa1e7926302b07564480a8eb2f605be20/cmd/openshift-install/create.go#L868 func meetsStabilityThreshold(progressing *configv1.ClusterOperatorStatusCondition) bool { return progressing.Status == configv1.ConditionFalse && time.Since(progressing.LastTransitionTime.Time).Seconds() > coStabilityThreshold } Looking at the installer debug logging for the 4.19 run in the original description of this bug we see this: time= "2025-03-09T20:39:20Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:18 +0000 UTC DurationSinceTransition=3s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC" time= "2025-03-09T20:39:21Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:18 +0000 UTC DurationSinceTransition=4s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC" time= "2025-03-09T20:39:22Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:21 +0000 UTC DurationSinceTransition=2s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC" time= "2025-03-09T20:39:23Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:21 +0000 UTC DurationSinceTransition=3s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC" time= "2025-03-09T20:39:24Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:23 +0000 UTC DurationSinceTransition=2s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC" time= "2025-03-09T20:39:25Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:25 +0000 UTC DurationSinceTransition=1s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC" time= "2025-03-09T20:39:26Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:25 +0000 UTC DurationSinceTransition=2s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC" time= "2025-03-09T20:39:27Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:25 +0000 UTC DurationSinceTransition=3s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC" time= "2025-03-09T20:39:28Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:25 +0000 UTC DurationSinceTransition=4s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC" time= "2025-03-09T20:39:29Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:25 +0000 UTC DurationSinceTransition=5s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: openshift-service-ca.crt,nodejs,java,kube-root-ca.crt; last import attempt 2025-03-09 19:57:39 +0000 UTC" time= "2025-03-09T20:39:30Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-09 20:39:30 +0000 UTC DurationSinceTransition=1s Reason=FailedImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: java,kube-root-ca.crt,openshift-service-ca.crt,nodejs; last import attempt 2025-03-09 19:57:39 +0000 UTC" Note that the last transition time is changing frequently despite the operator not changing state, I believe lastTransitionTimestamp should only change if the actual Progressing state changes, but regardless it definitely shouldn't be changing if the reason has changed. If you look closer, you'll also notice the reason is changing slightly in that the list of images is jumping around in ordering, it's seemingly not sorted. Now I'm wondering what it's like in 4.18 where our problem is not happening, and the first run I open has the following in it's installer log : time= "2025-03-14T20:47:34Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=15s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC" time= "2025-03-14T20:47:35Z" level=debug msg= "Cluster Operator kube-apiserver is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=16s Reason=AsExpected Message=NodeInstallerProgressing: 3 nodes are at revision 8" time= "2025-03-14T20:47:35Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=16s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC" time= "2025-03-14T20:47:36Z" level=debug msg= "Cluster Operator kube-apiserver is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=17s Reason=AsExpected Message=NodeInstallerProgressing: 3 nodes are at revision 8" time= "2025-03-14T20:47:36Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=17s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC" time= "2025-03-14T20:47:37Z" level=debug msg= "Cluster Operator kube-apiserver is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=18s Reason=AsExpected Message=NodeInstallerProgressing: 3 nodes are at revision 8" time= "2025-03-14T20:47:37Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=18s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC" time= "2025-03-14T20:47:38Z" level=debug msg= "Cluster Operator kube-apiserver is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=19s Reason=AsExpected Message=NodeInstallerProgressing: 3 nodes are at revision 8" time= "2025-03-14T20:47:38Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=19s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC" time= "2025-03-14T20:47:39Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=20s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC" time= "2025-03-14T20:47:39Z" level=debug msg= "Cluster Operator kube-apiserver is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=20s Reason=AsExpected Message=NodeInstallerProgressing: 3 nodes are at revision 8" time= "2025-03-14T20:47:40Z" level=debug msg= "Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 20:47:19 +0000 UTC DurationSinceTransition=21s Reason=FailedImageImports Message=Samples installed at 4.18.0-0.nightly-2025-03-14-195326, with image import failures for these imagestreams: mysql,jenkins-agent-base,openshift-service-ca.crt,jenkins,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,golang,redis,nodejs,jboss-eap-xp4-openjdk11-openshift,nginx,kube-root-ca.crt,perl; last import attempt 2025-03-14 20:34:56 +0000 UTC" Here the lastTransitionTime is NOT changing as much, it does change sometimes but it looks like it's when the underlying condition reason changes, but in 4.19 the lastTransitionTime seems to change regardless if the reason changed. Not to mention, here is our symptom present in 4.18, but NOT failing the installs because lastTransitionTime is not updating. So now what is happening on non-techpreview gcp jobs in 4.19? time= "2025-03-14T23:44:45Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=12s Reason=Fa iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runtime-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC " time=" 2025-03-14T23:44:46Z " level=debug msg=" Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=13s Reason=Fa iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runtime-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC " time=" 2025-03-14T23:44:47Z " level=debug msg=" Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=14s Reason=Fa iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runtime-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC " time=" 2025-03-14T23:44:48Z " level=debug msg=" Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=15s Reason=Fa iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runtime-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC " time=" 2025-03-14T23:44:49Z " level=debug msg=" Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=16s Reason=Fa iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runtime-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC " time=" 2025-03-14T23:44:50Z " level=debug msg=" Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=17s Reason=Fa iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runtime-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql,mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC" time= "2025-03-14T23:44:51Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=18s Reason=Fa iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runti me-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql, mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC" time= "2025-03-14T23:44:52Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=19s Reason=Fa iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runti me-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql, mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC" time= "2025-03-14T23:44:53Z" level=debug msg="Cluster Operator openshift-samples is Progressing=False LastTransitionTime=2025-03-14 23:44:33 +0000 UTC DurationSinceTransition=20s Reason=Fa iledImageImports Message=Samples installed at 4.19.0-0.nightly-2025-03-14-225018, with image import failures for these imagestreams: openshift-service-ca.crt,jboss-eap-xp3-openjdk11-runti me-openshift,fuse7-karaf-openshift,postgresql,redis,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,jenkins-agent-base,java,nodejs,php,kube-root-ca.crt,nginx,fuse7-java-openshift,mysql, mariadb,python,golang; last import attempt 2025-03-14 23:34:47 +0000 UTC" Conclusion: we're not chasing an image pull problem, the image pull problem is everywhere and likely very common. The actual difference in behaviour seems to be that the operator is now updating it's lastTransitionTime every sync loop perhaps because the ordering of the image streams is changing. Because of this, when the registry pulls are having problems, the installer thinks the Progressing condition is not stable. Somehow, this is happening in 4.19 techpreview, and seemingly not in regular non-techpreview gcp jobs.

Jamo Luhrsen added a comment - 2025/03/14 3:10 PM

rhn-engineering-dgoodwin, I think this is still happening but I noticed this got moved to ON_QA. should we re-open it?

Jamo Luhrsen added a comment - 2025/03/14 3:10 PM rhn-engineering-dgoodwin , I think this is still happening but I noticed this got moved to ON_QA. should we re-open it?

OpenShift Jira Bot added a comment - 2025/03/13 1:17 AM

Hi aroyo@redhat.com,

Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

OpenShift Jira Bot added a comment - 2025/03/13 1:17 AM Hi aroyo@redhat.com , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

Devan Goodwin added a comment - 2025/03/11 5:53 PM

After slack discussion we're quite stumped. Seems to affect only gcp techpreview, virtually no hits on regular gcp jobs or any other cloud. Installer and cloud teams both do not know of anything that went in around March 5th to cause this.

Still need to rule out RHCOS (a stretch given techpreview) and maybe networking?

Devan Goodwin added a comment - 2025/03/11 5:53 PM After slack discussion we're quite stumped. Seems to affect only gcp techpreview, virtually no hits on regular gcp jobs or any other cloud. Installer and cloud teams both do not know of anything that went in around March 5th to cause this. Still need to rule out RHCOS (a stretch given techpreview) and maybe networking?

Antonio Carlos Royo added a comment - 2025/03/11 3:54 PM

2 PRs got created reverting the code changes of last week
https://github.com/openshift/cluster-samples-operator/pull/604

https://github.com/openshift/cluster-samples-operator/pull/605

However even by reverting the changes the test periodic-ci-openshift-release-master-ci-4.19-e2e-gcp-ovn-techpreview-serial is still failing.

Seems to me that this is an infra issue and I will share the analysis made:

Checking the build logs I see the tests are failing with

level=error msg=Error checking cluster operator Progressing status: "context deadline exceeded" 273
level=error msg=These cluster operators were not stable: [openshift-samples]

For https://pr-payload-tests.ci.openshift.org/runs/ci/bc350a00-fde7-11ef-9d12-f04faa71fca2-0 I see that samples operator reported:

- lastTransitionTime: "2025-03-10T21:08:06Z"
    message: 'Samples installed at 4.19.0-0.ci.test-2025-03-10-194949-ci-op-nk70lkjv-latest,
      with image import failures for these imagestreams: fuse7-karaf-openshift,jboss-datagrid73-openshift,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,kube-root-ca.crt,fuse7-java-openshift,java,openshift-service-ca.crt;
      last import attempt 2025-03-10 20:23:42 +0000 UTC'
    reason: FailedImageImports
    status: "False"
    type: Progressing

When we check the actual logs of the pod we see that it is timing out when trying to pull images from registry.redhat.io

fuse7-karaf-openshift

2025-03-10T20:24:09.798770348Z time="2025-03-10T20:24:09Z" level=warning msg="Image import for imagestream fuse7-karaf-openshift tag 1.0 generation 2 failed with detailed message Internal error occurred: registry.redhat.io/fuse7/fuse-karaf-openshift:1.0: Get \"https://registry.redhat.io/v2/\": dial tcp 23.210.147.13:443: i/o timeout"

2025-03-10T20:41:11.304600235Z time="2025-03-10T20:41:11Z" level=warning msg="Image import for imagestream fuse7-karaf-openshift tag 1.8 generation 2 failed with detailed message Internal error occurred: registry.redhat.io/fuse7/fuse-karaf-openshift:1.8: Get \"https://registry.redhat.io/v2/\": dial tcp 23.210.147.13:443: i/o timeout"

jboss-datagrid73-openshift

2025-03-10T20:23:43.048559896Z time="2025-03-10T20:23:43Z" level=warning msg="Image import for imagestream jboss-datagrid73-openshift tag 1.0 generation 2 failed with detailed message Internal error occurred: registry.redhat.io/jboss-datagrid-7/datagrid73-openshift:1.0: Get \"https://registry.redhat.io/v2/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
  
2025-03-10T20:23:44.223658699Z time="2025-03-10T20:23:44Z" level=warning msg="Image import for imagestream jboss-datagrid73-openshift tag 1.5 generation 2 failed with detailed message Internal error occurred: registry.redhat.io/jboss-datagrid-7/datagrid73-openshift:1.5: Get \"https://registry.redhat.io/v2/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"

Same thing happened with https://pr-payload-tests.ci.openshift.org/runs/ci/70ff11f0-fde4-11ef-83ce-fa21c44c4bdc-0

2025-03-10T20:01:13.427712574Z time="2025-03-10T20:01:13Z" level=warning msg="Image import for imagestream postgresql tag 10 generation 2 failed with detailed message Internal error occurred: registry.redhat.io/rhscl/postgresql-10-rhel7:latest: Get \"https://registry.redhat.io/v2/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"

Seems to me that there are infra issues with GCP and is taking too much time for the samples operator images to be pulled as is timing out to registry.redhat.io no additional changes have been made on samples operator that could make this process to go slower.

Antonio Carlos Royo added a comment - 2025/03/11 3:54 PM 2 PRs got created reverting the code changes of last week https://github.com/openshift/cluster-samples-operator/pull/604 https://github.com/openshift/cluster-samples-operator/pull/605 However even by reverting the changes the test periodic-ci-openshift-release-master-ci-4.19-e2e-gcp-ovn-techpreview-serial is still failing. Seems to me that this is an infra issue and I will share the analysis made: Checking the build logs I see the tests are failing with level=error msg=Error checking cluster operator Progressing status: "context deadline exceeded" 273 level=error msg=These cluster operators were not stable: [openshift-samples] For https://pr-payload-tests.ci.openshift.org/runs/ci/bc350a00-fde7-11ef-9d12-f04faa71fca2-0 I see that samples operator reported: - lastTransitionTime: "2025-03-10T21:08:06Z" message: 'Samples installed at 4.19.0-0.ci.test-2025-03-10-194949-ci-op-nk70lkjv-latest, with image import failures for these imagestreams: fuse7-karaf-openshift,jboss-datagrid73-openshift,jboss-webserver57-openjdk11-tomcat9-openshift-ubi8,jboss-webserver57-openjdk8-tomcat9-openshift-ubi8,kube-root-ca.crt,fuse7-java-openshift,java,openshift-service-ca.crt; last import attempt 2025-03-10 20:23:42 +0000 UTC' reason: FailedImageImports status: "False" type: Progressing When we check the actual logs of the pod we see that it is timing out when trying to pull images from registry.redhat.io fuse7-karaf-openshift 2025-03-10T20:24:09.798770348Z time= "2025-03-10T20:24:09Z" level=warning msg= "Image import for imagestream fuse7-karaf-openshift tag 1.0 generation 2 failed with detailed message Internal error occurred: registry.redhat.io/fuse7/fuse-karaf-openshift:1.0: Get \" https: //registry.redhat.io/v2/\ ": dial tcp 23.210.147.13:443: i/o timeout" 2025-03-10T20:41:11.304600235Z time= "2025-03-10T20:41:11Z" level=warning msg= "Image import for imagestream fuse7-karaf-openshift tag 1.8 generation 2 failed with detailed message Internal error occurred: registry.redhat.io/fuse7/fuse-karaf-openshift:1.8: Get \" https: //registry.redhat.io/v2/\ ": dial tcp 23.210.147.13:443: i/o timeout" jboss-datagrid73-openshift 2025-03-10T20:23:43.048559896Z time= "2025-03-10T20:23:43Z" level=warning msg= "Image import for imagestream jboss-datagrid73-openshift tag 1.0 generation 2 failed with detailed message Internal error occurred: registry.redhat.io/jboss-datagrid-7/datagrid73-openshift:1.0: Get \" https: //registry.redhat.io/v2/\ ": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" 2025-03-10T20:23:44.223658699Z time= "2025-03-10T20:23:44Z" level=warning msg= "Image import for imagestream jboss-datagrid73-openshift tag 1.5 generation 2 failed with detailed message Internal error occurred: registry.redhat.io/jboss-datagrid-7/datagrid73-openshift:1.5: Get \" https: //registry.redhat.io/v2/\ ": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" Same thing happened with https://pr-payload-tests.ci.openshift.org/runs/ci/70ff11f0-fde4-11ef-83ce-fa21c44c4bdc-0 2025-03-10T20:01:13.427712574Z time= "2025-03-10T20:01:13Z" level=warning msg= "Image import for imagestream postgresql tag 10 generation 2 failed with detailed message Internal error occurred: registry.redhat.io/rhscl/postgresql-10-rhel7:latest: Get \" https: //registry.redhat.io/v2/\ ": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" Seems to me that there are infra issues with GCP and is taking too much time for the samples operator images to be pulled as is timing out to registry.redhat.io no additional changes have been made on samples operator that could make this process to go slower.

Assignee:: Nolan Brubaker

Reporter:: Devan Goodwin

QA Contact:: Jitendar Singh

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2025/03/10 12:13 PM

Updated:: 2025/04/22 7:05 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Devan Goodwin added a comment - 2025/04/02 1:34 PM

Expand comment: Devan Goodwin added a comment - 2025/04/02 1:34 PM

Collapse comment: Devan Goodwin added a comment - 2025/03/18 6:03 PM

Expand comment: Devan Goodwin added a comment - 2025/03/18 6:03 PM

Collapse comment: Devan Goodwin added a comment - 2025/03/18 2:50 PM, Edited by Devan Goodwin - 2025/03/18 2:56 PM

Expand comment: Devan Goodwin added a comment - 2025/03/18 2:50 PM, Edited by Devan Goodwin - 2025/03/18 2:56 PM

Collapse comment: Devan Goodwin added a comment - 2025/03/17 5:18 PM

Expand comment: Devan Goodwin added a comment - 2025/03/17 5:18 PM

Collapse comment: Jamo Luhrsen added a comment - 2025/03/14 3:10 PM

Expand comment: Jamo Luhrsen added a comment - 2025/03/14 3:10 PM

Collapse comment: OpenShift Jira Bot added a comment - 2025/03/13 1:17 AM

Expand comment: OpenShift Jira Bot added a comment - 2025/03/13 1:17 AM

Collapse comment: Devan Goodwin added a comment - 2025/03/11 5:53 PM

Expand comment: Devan Goodwin added a comment - 2025/03/11 5:53 PM

Collapse comment: Antonio Carlos Royo added a comment - 2025/03/11 3:54 PM

Expand comment: Antonio Carlos Royo added a comment - 2025/03/11 3:54 PM

People

Dates