Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-49908

[DCM] Service idling fails due to non disabled health check on server slot with clusterIP address

XMLWordPrintable

    • Important
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      DCM cannot disable the health check dynamically. The API command exists (https://www.haproxy.com/documentation/haproxy-runtime-api/reference/disable-health/) but not used. This causes a problem with idled services whose server slot is set with the single endpoint - service's clusterIP. However a health check sent to the service's clusterIP unidles the service. In non DCM case this is mitigated by removing "check" keyword during the config template rendering (https://github.com/openshift/router/blob/b447c4d27d38a6c5f6ce2d5ceda88dbc9b90c661/images/router/haproxy/conf/haproxy-config.template#L755). Although in DCM case only the server slot is updated with the clusterIP keeping the health check enabled.

      Version-Release number of selected component (if applicable):

      4.18+ (when TechPreview featureset is used)  

      How reproducible:

      Not always but often

      Steps to Reproduce:

      Origin's test "Idling with a single service and ReplicationController should idle the service and ReplicationController properly"(https://github.com/openshift/origin/blob/c22aad19d02153b206ec2efbd8e18c24b70a191a/test/extended/idling/idling.go#L230-L238).
          

      Actual results:

        [sig-network-edge][Feature:Idling] Idling with a single service and ReplicationController [It] should idle the service and ReplicationController properly [Suite:openshift/conformance/parallel]
        github.com/openshift/origin/test/extended/idling/idling.go:235
      
          [FAILED] Failed after 1.209s.
          Expected
              <string>: "2"
          to contain substring
              <string>: 0
          In [It] at: github.com/openshift/origin/test/extended/idling/idling.go:129 @ 02/05/25 22:53:12.733
        ------------------------------
      
        Summarizing 1 Failure:
          [FAIL] [sig-network-edge][Feature:Idling] Idling with a single service and ReplicationController [It] should idle the service and ReplicationController properly [Suite:openshift/conformance/parallel]
          github.com/openshift/origin/test/extended/idling/idling.go:129
      
        Ran 1 of 1 Specs in 13.544 seconds
        FAIL! -- 0 Passed | 1 Failed | 0 Pending | 0 Skipped
      fail [github.com/openshift/origin/test/extended/idling/idling.go:129]: Failed after 1.209s.
      Expected
          <string>: "2"
      to contain substring
          <string>: 0
      Ginkgo exit error 1: exit with code 
      

      Expected results:

      Successful test.

      Additional info:

      Potential fix: fallback to the router reload for the route update event. Example: https://github.com/alebedev87/router/commit/d10e732d2a87cf22fb7e7bbfb14fb0b48d000068

              nid-team-bot NID Team Bot
              alebedev@redhat.com Andrey Lebedev
              Shudi Li Shudi Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: