Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-21803

Ingress stuck in progressing when maxConnections increased to 2000000

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done-Errata
    • Critical
    • 4.15.0
    • 4.14, 4.14.z, 4.15
    • Networking / router
    • None
    • Yes
    • Sprint 243, Sprint 244
    • 2
    • Approved
    • False
    • Hide

      None

      Show
      None
    • Hide
      * The transition to HAProxy 2.6 included enforcement for the 'strict-limits' configuration, which resulted in fatal errors when 'maxConnections' requirements could not be met. This release introduces a configuration adjustment in response to the transition to the `maxConnections` issues.
      +
      With this update, the HAProxy configuration switches to using `no strict-limits`. As a result, HAProxy no longer fatally exists when the `maxConnection` configuration cannot be satisfied. Instead, it emits warnings and continues running. When `maxConnection` limitations cannot be met, a warning like the following might be returned:
      *
      `[WARNING] (50) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.
      [ALERT] (50) : [/usr/sbin/haproxy.main()] FD limit (1048576) too low for maxconn=2000000/maxsock=4000237. Please raise 'ulimit-n' to 4000237 or more to avoid any trouble.`
      +
      To resolve these warnings, we recommend specifying `-1`, or `auto`, for the `maxConnections` field when tuning an IngressController. This choice allows HAProxy to dynamically calculate the maximum value based on the available resource limitations in the running container, which eliminates these warnings.
      +
      [IMPORTANT]
      ====
      * The 'strict-limits' setting is not configurable by end users and remains under the control of the HAProxy template.
      ====
      +
      (https://issues.redhat.com/browse/OCPBUGS-21803[*OCPBUGS-21803*])

      Show
      * The transition to HAProxy 2.6 included enforcement for the 'strict-limits' configuration, which resulted in fatal errors when 'maxConnections' requirements could not be met. This release introduces a configuration adjustment in response to the transition to the `maxConnections` issues. + With this update, the HAProxy configuration switches to using `no strict-limits`. As a result, HAProxy no longer fatally exists when the `maxConnection` configuration cannot be satisfied. Instead, it emits warnings and continues running. When `maxConnection` limitations cannot be met, a warning like the following might be returned: * `[WARNING] (50) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576. [ALERT] (50) : [/usr/sbin/haproxy.main()] FD limit (1048576) too low for maxconn=2000000/maxsock=4000237. Please raise 'ulimit-n' to 4000237 or more to avoid any trouble.` + To resolve these warnings, we recommend specifying `-1`, or `auto`, for the `maxConnections` field when tuning an IngressController. This choice allows HAProxy to dynamically calculate the maximum value based on the available resource limitations in the running container, which eliminates these warnings. + [IMPORTANT] ==== * The 'strict-limits' setting is not configurable by end users and remains under the control of the HAProxy template. ==== + ( https://issues.redhat.com/browse/OCPBUGS-21803 [* OCPBUGS-21803 *])
    • Bug Fix
    • Done

    Description

      Description of problem:

      The test case https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-50926 was created for NE-577 epic. When we increase the 'spec.tuningOptions.maxConnections' to 200000, the default ingress controller stuck in progressing.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-50926

      Steps to Reproduce:

      1.Edit the defualt controller with max value 2000000oc -n openshift-ingress-operator edit ingresscontroller defaulttuningOptions:
          maxConnections: 2000000
      2.melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get ingresscontroller default -o yaml | grep  -A1 tuningOptions
        tuningOptions:
          maxConnections: 2000000
      3. melvinjoseph@mjoseph-mac openshift-tests-private % oc get co/ingress 
      NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      ingress   4.15.0-0.nightly-2023-10-16-231617   True        True          False      3h42m   ingresscontroller "default" is progressing: IngressControllerProgressing: One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination......
      

      Actual results:

      The default ingress controller stuck in progressing

      Expected results:

      The ingress controller should work as normal

      Additional info:

      melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress get po
      NAME                              READY   STATUS        RESTARTS   AGE
      router-default-7cf67f448-gb7mr    0/1     Running       0          38s
      router-default-7cf67f448-qmvks    0/1     Running       0          38s
      router-default-7dcd556587-kvk8d   0/1     Terminating   0          3h53m
      router-default-7dcd556587-vppk4   1/1     Running       0          3h53m
      melvinjoseph@mjoseph-mac openshift-tests-private % 
      
      melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress get po
      NAME                              READY   STATUS    RESTARTS   AGE
      router-default-7cf67f448-gb7mr    0/1     Running   0          111s
      router-default-7cf67f448-qmvks    0/1     Running   0          111s
      router-default-7dcd556587-vppk4   1/1     Running   0          3h55m
      
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h28m   
      baremetal                                  4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
      cloud-controller-manager                   4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h58m   
      cloud-credential                           4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h59m   
      cluster-autoscaler                         4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
      config-operator                            4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h56m   
      console                                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h34m   
      control-plane-machine-set                  4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h43m   
      csi-snapshot-controller                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
      dns                                        4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h54m   
      etcd                                       4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h47m   
      image-registry                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      176m    
      ingress                                    4.15.0-0.nightly-2023-10-16-231617   True        True          False      3h39m   ingresscontroller "default" is progressing: IngressControllerProgressing: One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination......
      insights                                   4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h49m   
      kube-apiserver                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h45m   
      kube-controller-manager                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h46m   
      kube-scheduler                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h46m   
      kube-storage-version-migrator              4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h56m   
      machine-api                                4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h45m   
      machine-approver                           4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
      machine-config                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h53m   
      marketplace                                4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
      monitoring                                 4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h35m   
      network                                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h57m   
      node-tuning                                4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
      openshift-apiserver                        4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h43m   
      openshift-controller-manager               4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
      openshift-samples                          4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
      operator-lifecycle-manager                 4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h54m   
      operator-lifecycle-manager-catalog         4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h54m   
      operator-lifecycle-manager-packageserver   4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h43m   
      service-ca                                 4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h56m   
      storage                                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h36m   
      melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get po
      NAME                               READY   STATUS    RESTARTS        AGE
      ingress-operator-c6fd989fd-jsrzv   2/2     Running   4 (3h45m ago)   3h58m
      melvinjoseph@mjoseph-mac openshift-tests-private % 
      
      
      melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator logs ingress-operator-c6fd989fd-jsrzv -c ingress-operator --tail=20
      2023-10-17T11:34:54.327Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
      2023-10-17T11:34:54.348Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
      2023-10-17T11:34:54.348Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
      2023-10-17T11:34:54.394Z    INFO    operator.ingressclass_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:34:54.394Z    INFO    operator.route_metrics_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:34:54.394Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:34:54.397Z    INFO    operator.ingress_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:34:54.429Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:34:54.446Z    INFO    operator.certificate_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:34:54.553Z    INFO    operator.ingressclass_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:34:54.553Z    INFO    operator.route_metrics_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:34:54.553Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:34:54.557Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "59m59.9999758s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
      2023-10-17T11:34:54.558Z    INFO    operator.ingress_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:34:54.583Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:34:54.657Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "59m59.345629987s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
      2023-10-17T11:34:54.794Z    INFO    operator.certificate_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:36:11.151Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
      2023-10-17T11:36:11.151Z    INFO    operator.ingress_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
      2023-10-17T11:36:11.248Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "58m42.755479533s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
      melvinjoseph@mjoseph-mac openshift-tests-private % 
      
       
      melvinjoseph@mjoseph-mac openshift-tests-private % oc get po -n openshift-ingress
      NAME                              READY   STATUS    RESTARTS      AGE
      router-default-7cf67f448-gb7mr    0/1     Running   1 (71s ago)   3m57s
      router-default-7cf67f448-qmvks    0/1     Running   1 (70s ago)   3m57s
      router-default-7dcd556587-vppk4   1/1     Running   0             3h57m
      
      melvinjoseph@mjoseph-mac openshift-tests-private %   oc -n openshift-ingress logs router-default-7cf67f448-gb7mr --tail=20 
      I1017 11:39:22.623928       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:23.623924       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:24.623373       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:25.627359       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:26.623337       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:27.623603       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:28.623866       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:29.623183       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:30.623475       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:31.623949       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      melvinjoseph@mjoseph-mac openshift-tests-private % 
      melvinjoseph@mjoseph-mac openshift-tests-private % 
      melvinjoseph@mjoseph-mac openshift-tests-private % 
      melvinjoseph@mjoseph-mac openshift-tests-private %   oc -n openshift-ingress logs router-default-7cf67f448-qmvks --tail=20
      I1017 11:39:34.553475       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:35.551412       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:36.551421       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      E1017 11:39:37.052068       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
      I1017 11:39:37.551648       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:38.551632       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:39.551410       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:40.552620       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:41.552050       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:42.551076       1 healthz.go:261] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1017 11:39:42.564293       1 template.go:828] router "msg"="Shutdown requested, waiting 45s for new connections to cease" 
      
      melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get ingresscontroller 
      NAME      AGE
      default   3h59m
      melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get ingresscontroller default -o yaml
      apiVersion: operator.openshift.io/v1
      <-----snip---->
      status:
        availableReplicas: 1
        conditions:
        - lastTransitionTime: "2023-10-17T07:41:42Z"
          reason: Valid
          status: "True"
          type: Admitted
        - lastTransitionTime: "2023-10-17T07:57:01Z"
          message: The deployment has Available status condition set to True
          reason: DeploymentAvailable
          status: "True"
          type: DeploymentAvailable
        - lastTransitionTime: "2023-10-17T07:57:01Z"
          message: Minimum replicas requirement is met
          reason: DeploymentMinimumReplicasMet
          status: "True"
          type: DeploymentReplicasMinAvailable
        - lastTransitionTime: "2023-10-17T11:34:54Z"
          message: 1/2 of replicas are available
          reason: DeploymentReplicasNotAvailable
          status: "False"
          type: DeploymentReplicasAllAvailable
        - lastTransitionTime: "2023-10-17T11:34:54Z"
          message: |
            Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination...
          reason: DeploymentRollingOut
          status: "True"
          type: DeploymentRollingOut
        - lastTransitionTime: "2023-10-17T07:41:43Z"
          message: The endpoint publishing strategy supports a managed load balancer
          reason: WantedByEndpointPublishingStrategy
          status: "True"
          type: LoadBalancerManaged
        - lastTransitionTime: "2023-10-17T07:57:24Z"
          message: The LoadBalancer service is provisioned
          reason: LoadBalancerProvisioned
          status: "True"
          type: LoadBalancerReady
        - lastTransitionTime: "2023-10-17T07:41:43Z"
          message: LoadBalancer is not progressing
          reason: LoadBalancerNotProgressing
          status: "False"
          type: LoadBalancerProgressing
        - lastTransitionTime: "2023-10-17T07:41:43Z"
          message: DNS management is supported and zones are specified in the cluster DNS
            config.
          reason: Normal
          status: "True"
          type: DNSManaged
        - lastTransitionTime: "2023-10-17T07:57:26Z"
          message: The record is provisioned in all reported zones.
          reason: NoFailedZones
          status: "True"
          type: DNSReady
        - lastTransitionTime: "2023-10-17T07:57:26Z"
          status: "True"
          type: Available
        - lastTransitionTime: "2023-10-17T11:34:54Z"
          message: |-
            One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination...
            )
          reason: IngressControllerProgressing
          status: "True"
          type: Progressing
        - lastTransitionTime: "2023-10-17T07:57:28Z"
          status: "False"
          type: Degraded
        - lastTransitionTime: "2023-10-17T07:41:43Z"
      <-----snip---->

       

      Attachments

        Issue Links

          Activity

            People

              amcdermo@redhat.com ANDREW MCDERMOTT
              rhn-support-mjoseph Melvin Joseph
              Shudi Li Shudi Li
              Shudi Li
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: