Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.16.0
Component/s: Bare Metal Hardware Provisioning
Labels:
- component-regression
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:

4.16.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

If you look at all the failures on this page you will notice that there is a problem with the "extra worker" required for the virtualmedia tests. One way to see this is by clicking on the camgi link in the prowjob (see attached screenshot if you aren't familiar with this tool).

When the MCO team investigated they could tell that the machine had indeed successfully joined the cluster for some period of time. You can see "message: Kubelet stopped posting node status" if you click on the extraworker node in camgi. As best they can tell this is an infrastructure problem.

This problem is causing quite a bit of toil understanding the CI signal for autoscaling on the metal platform. We need assistance from the metal team to improve this or help find other teams to involve in the debugging process.

Everything below this line is the details from Component Readiness:
-----------------------
Component Readiness has found a potential regression in [sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial].

Probability of significant regression: 99.62%

Sample (being evaluated) Release: 4.15
Start Time: 2024-02-22T00:00:00Z
End Time: 2024-02-28T23:59:59Z
Success Rate: 80.00%
Successes: 16
Failures: 4
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 98.35%
Successes: 119
Failures: 2
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Machines&component=Cloud%20Compute%20%2F%20Cluster%20Autoscaler&confidence=95&environment=ovn%20no-upgrade%20amd64%20metal-ipi%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=metal-ipi&platform=metal-ipi&sampleEndTime=2024-02-28%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-02-22%2000%3A00%3A00&testId=openshift-tests%3A9f3fb60052539c29ab66564689f616ce&testName=%5Bsig-cluster-lifecycle%5D%5BFeature%3AMachines%5D%5BSerial%5D%20Managed%20cluster%20should%20grow%20and%20decrease%20when%20scaling%20different%20machineSets%20simultaneously%20%5BTimeout%3A30m%5D%5Bapigroup%3Amachine.openshift.io%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D&upgrade=no-upgrade&upgrade=no-upgrade&variant=serial&variant=serial