Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.16
Component/s: Networking / cluster-network-operator
Labels:
- CNO
- SDN:Backport
- bug
- metric
- update

Severity:
Important
Regression:
None
Sprint:
SDN Sprint 263, SDN Sprint 264, SDN Sprint 265, SDN Sprint 266
sprint_count:
4
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Type:
Release Note Not Required
Release Note Status:
In Progress
Target Version:

4.17.z
Target Backport Versions:

4.16.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

This is a clone of issue OCPBUGS-42189. The following is the description of the original issue:
—
Description of problem:

Starting with OpenShift Container Platform 4.16, it was observed that cluster-network-operator is stuck in CrashLoopBackOff state because of the below error reported.

2024-09-17T16:32:46.503056041Z I0917 16:32:46.503016       1 controller.go:242] "All workers finished" controller="pod-watcher"
2024-09-17T16:32:46.503056041Z I0917 16:32:46.503045       1 internal.go:526] "Stopping and waiting for caches"
2024-09-17T16:32:46.503209536Z I0917 16:32:46.503189       1 internal.go:530] "Stopping and waiting for webhooks"
2024-09-17T16:32:46.503209536Z I0917 16:32:46.503206       1 internal.go:533] "Stopping and waiting for HTTP servers"
2024-09-17T16:32:46.503217413Z I0917 16:32:46.503212       1 internal.go:537] "Wait completed, proceeding to shutdown the manager"
2024-09-17T16:32:46.503231142Z F0917 16:32:46.503221       1 operator.go:130] Failed to start controller-runtime manager: failed to start metrics server: failed to create listener: listen tcp :8080: bind: address already in use

That problem seems to be related to the change done in https://github.com/openshift/cluster-network-operator/pull/2274/commits/acd67b432be4ef2efb470710aebba2e3551bc00d#diff-99c0290799daf9abc6240df64063e20bfaf67b371577b67ac7eec6f4725622ff, where it was missed to pass BindAddress with 0 https://github.com/openshift/cluster-network-operator/blob/master/vendor/sigs.k8s.io/controller-runtime/pkg/metrics/server/server.go#L70 to keep previous functionality.
With the current code in place, cluster-network-operator will expose a metrics server on port 8080 which was not the case and can create conflicts with custom application.

This is especially true in environment, where compact OpenShift Container Platform 4 - Clusters are running (three-node cluster)

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.16

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4.15 (three-node cluster) and create a service that is listening on HostNetwork with port 8080
2. Update to OpenShift Container Platform 4.16
3. Watch cluster-network-operator being stuck in CrashLoopBackOff state because port 8080 is already bound

Actual results:

2024-09-17T16:32:46.503056041Z I0917 16:32:46.503016       1 controller.go:242] "All workers finished" controller="pod-watcher"
2024-09-17T16:32:46.503056041Z I0917 16:32:46.503045       1 internal.go:526] "Stopping and waiting for caches"
2024-09-17T16:32:46.503209536Z I0917 16:32:46.503189       1 internal.go:530] "Stopping and waiting for webhooks"
2024-09-17T16:32:46.503209536Z I0917 16:32:46.503206       1 internal.go:533] "Stopping and waiting for HTTP servers"
2024-09-17T16:32:46.503217413Z I0917 16:32:46.503212       1 internal.go:537] "Wait completed, proceeding to shutdown the manager"
2024-09-17T16:32:46.503231142Z F0917 16:32:46.503221       1 operator.go:130] Failed to start controller-runtime manager: failed to start metrics server: failed to create listener: listen tcp :8080: bind: address already in use

Expected results:

In previous version BindAddress was set to 0 for the Metrics server, meaning it would not start respectively expose on port 8080. Therefore the same should be done with OpenShift Container Platform 4.16 to keep backward compatability and prevent port conflicts.

Additional info:

clones

OCPBUGS-42189 cluster-network-operator failing to start metrics server on port 8080

Verified

is blocked by

OCPBUGS-42189 cluster-network-operator failing to start metrics server on port 8080

Verified

links to

openshift/cluster-network-operator#2579: [release-4.17] OCPBUGS-45086: Re-disable metrics server

Assignee:: Patryk Diak

Reporter:: OpenShift Prow Bot

QA Contact:: Qiong Wang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/11/26 8:16 PM

Updated:: 2025/01/27 9:13 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates