-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.16
This is a clone of issue OCPBUGS-42189. The following is the description of the original issue:
—
Description of problem:
Starting with OpenShift Container Platform 4.16, it was observed that cluster-network-operator is stuck in CrashLoopBackOff state because of the below error reported. 2024-09-17T16:32:46.503056041Z I0917 16:32:46.503016 1 controller.go:242] "All workers finished" controller="pod-watcher" 2024-09-17T16:32:46.503056041Z I0917 16:32:46.503045 1 internal.go:526] "Stopping and waiting for caches" 2024-09-17T16:32:46.503209536Z I0917 16:32:46.503189 1 internal.go:530] "Stopping and waiting for webhooks" 2024-09-17T16:32:46.503209536Z I0917 16:32:46.503206 1 internal.go:533] "Stopping and waiting for HTTP servers" 2024-09-17T16:32:46.503217413Z I0917 16:32:46.503212 1 internal.go:537] "Wait completed, proceeding to shutdown the manager" 2024-09-17T16:32:46.503231142Z F0917 16:32:46.503221 1 operator.go:130] Failed to start controller-runtime manager: failed to start metrics server: failed to create listener: listen tcp :8080: bind: address already in use That problem seems to be related to the change done in https://github.com/openshift/cluster-network-operator/pull/2274/commits/acd67b432be4ef2efb470710aebba2e3551bc00d#diff-99c0290799daf9abc6240df64063e20bfaf67b371577b67ac7eec6f4725622ff, where it was missed to pass BindAddress with 0 https://github.com/openshift/cluster-network-operator/blob/master/vendor/sigs.k8s.io/controller-runtime/pkg/metrics/server/server.go#L70 to keep previous functionality. With the current code in place, cluster-network-operator will expose a metrics server on port 8080 which was not the case and can create conflicts with custom application. This is especially true in environment, where compact OpenShift Container Platform 4 - Clusters are running (three-node cluster)
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.16
How reproducible:
Always
Steps to Reproduce:
1. Install OpenShift Container Platform 4.15 (three-node cluster) and create a service that is listening on HostNetwork with port 8080 2. Update to OpenShift Container Platform 4.16 3. Watch cluster-network-operator being stuck in CrashLoopBackOff state because port 8080 is already bound
Actual results:
2024-09-17T16:32:46.503056041Z I0917 16:32:46.503016 1 controller.go:242] "All workers finished" controller="pod-watcher" 2024-09-17T16:32:46.503056041Z I0917 16:32:46.503045 1 internal.go:526] "Stopping and waiting for caches" 2024-09-17T16:32:46.503209536Z I0917 16:32:46.503189 1 internal.go:530] "Stopping and waiting for webhooks" 2024-09-17T16:32:46.503209536Z I0917 16:32:46.503206 1 internal.go:533] "Stopping and waiting for HTTP servers" 2024-09-17T16:32:46.503217413Z I0917 16:32:46.503212 1 internal.go:537] "Wait completed, proceeding to shutdown the manager" 2024-09-17T16:32:46.503231142Z F0917 16:32:46.503221 1 operator.go:130] Failed to start controller-runtime manager: failed to start metrics server: failed to create listener: listen tcp :8080: bind: address already in use
Expected results:
In previous version BindAddress was set to 0 for the Metrics server, meaning it would not start respectively expose on port 8080. Therefore the same should be done with OpenShift Container Platform 4.16 to keep backward compatability and prevent port conflicts.
Additional info:
- clones
-
OCPBUGS-42189 cluster-network-operator failing to start metrics server on port 8080
- Verified
- is blocked by
-
OCPBUGS-42189 cluster-network-operator failing to start metrics server on port 8080
- Verified
- links to