-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
Future Sustainability
-
False
-
-
False
-
None
-
None
-
None
-
None
-
None
For TRT-1576, we gave the image-registry operator an exception because the 'test to check if that operator goes Available=False outside of an upgrade' fails due to other tests in serial prow jobs doing destructive things like taining 2 nodes (when there are only 2 image-registry replicas). The plan was to merge https://github.com/openshift/origin/pull/28851 with the exception and address the exception as part of this Jira.
This Jira is about coming up with a solution for being able to run the test reliably. A few solutions were mentioned in our last discussion in a team sync on June 12, 2024.
1) Tweak the image-registry-operator to have 3 replicas instead of 2. The idea was that if 2 nodes were tainted, there would still be one replica available. We can't do this for all jobs as it causes issues with vsphere and upgrades. We want to see if we can build on updating the manifest to specifiy 3 replicas and anti-affinity rules but do so selectively based on the presence of an environment variable so we can target just the serial jobs that are having issues.
2) Set the image-registry deployment so that it has 3 replicas but this has to be done at the image-registry operator (i.e., set it to Unmanaged) because the operator will change the replicas back to the original setting.
These PRs were relevant to the investigation:
https://github.com/openshift/cluster-image-registry-operator/pull/1055 tried to do that but Flavian mentions that image-registry replicas had scheduled during upgrades.
https://github.com/openshift/release/pull/52731 was used to try to set the image-registry replicas to 3 and add the anti-affinity runs and run jobs against that scenario.
- relates to
-
TRT-2455 Filter image-registry disruption during NoExecuteTaintManager test
-
- New
-