Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45086

cluster-network-operator failing to start metrics server on port 8080

XMLWordPrintable

    • Important
    • None
    • SDN Sprint 263, SDN Sprint 264
    • 2
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      This is a clone of issue OCPBUGS-42189. The following is the description of the original issue:

      Description of problem:

      Starting with OpenShift Container Platform 4.16, it was observed that cluster-network-operator is stuck in CrashLoopBackOff state because of the below error reported.
      
      2024-09-17T16:32:46.503056041Z I0917 16:32:46.503016       1 controller.go:242] "All workers finished" controller="pod-watcher"
      2024-09-17T16:32:46.503056041Z I0917 16:32:46.503045       1 internal.go:526] "Stopping and waiting for caches"
      2024-09-17T16:32:46.503209536Z I0917 16:32:46.503189       1 internal.go:530] "Stopping and waiting for webhooks"
      2024-09-17T16:32:46.503209536Z I0917 16:32:46.503206       1 internal.go:533] "Stopping and waiting for HTTP servers"
      2024-09-17T16:32:46.503217413Z I0917 16:32:46.503212       1 internal.go:537] "Wait completed, proceeding to shutdown the manager"
      2024-09-17T16:32:46.503231142Z F0917 16:32:46.503221       1 operator.go:130] Failed to start controller-runtime manager: failed to start metrics server: failed to create listener: listen tcp :8080: bind: address already in use
      
      That problem seems to be related to the change done in https://github.com/openshift/cluster-network-operator/pull/2274/commits/acd67b432be4ef2efb470710aebba2e3551bc00d#diff-99c0290799daf9abc6240df64063e20bfaf67b371577b67ac7eec6f4725622ff, where it was missed to pass BindAddress with 0 https://github.com/openshift/cluster-network-operator/blob/master/vendor/sigs.k8s.io/controller-runtime/pkg/metrics/server/server.go#L70 to keep previous functionality.
      With the current code in place, cluster-network-operator will expose a metrics server on port 8080 which was not the case and can create conflicts with custom application.
      
      This is especially true in environment, where compact OpenShift Container Platform 4 - Clusters are running (three-node cluster)
      
      

      Version-Release number of selected component (if applicable):

      OpenShift Container Platform 4.16
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Install OpenShift Container Platform 4.15 (three-node cluster) and create a service that is listening on HostNetwork with port 8080
      2. Update to OpenShift Container Platform 4.16
      3. Watch cluster-network-operator being stuck in CrashLoopBackOff state because port 8080 is already bound
      

      Actual results:

      2024-09-17T16:32:46.503056041Z I0917 16:32:46.503016       1 controller.go:242] "All workers finished" controller="pod-watcher"
      2024-09-17T16:32:46.503056041Z I0917 16:32:46.503045       1 internal.go:526] "Stopping and waiting for caches"
      2024-09-17T16:32:46.503209536Z I0917 16:32:46.503189       1 internal.go:530] "Stopping and waiting for webhooks"
      2024-09-17T16:32:46.503209536Z I0917 16:32:46.503206       1 internal.go:533] "Stopping and waiting for HTTP servers"
      2024-09-17T16:32:46.503217413Z I0917 16:32:46.503212       1 internal.go:537] "Wait completed, proceeding to shutdown the manager"
      2024-09-17T16:32:46.503231142Z F0917 16:32:46.503221       1 operator.go:130] Failed to start controller-runtime manager: failed to start metrics server: failed to create listener: listen tcp :8080: bind: address already in use
      

      Expected results:

      In previous version BindAddress was set to 0 for the Metrics server, meaning it would not start respectively expose on port 8080. Therefore the same should be done with OpenShift Container Platform 4.16 to keep backward compatability and prevent port conflicts.
      

      Additional info:

      
      

              pdiak@redhat.com Patryk Diak
              openshift-crt-jira-prow OpenShift Prow Bot
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: