Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-13181

metric for ingresswithoutclassname does not decrease when classless ingresses cease to exist

    • Low
    • No
    • 2
    • Sprint 249, Sprint 250, Sprint 251, Sprint 252, Sprint 254, NE Sprint 255, NE Sprint 256, NE Sprint 257, NE Sprint 259, NE Sprint 261, NE Sprint 262, NE Sprint 263
    • 12
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, for an Ingress resource with an `IngressWithoutClassName` alert, the Ingress Controller did not delete the alert along with deletion of the resource. The alert continued to show on the {product-title} web console. With this release, the Ingress Controller resets the `openshift_ingress_to_route_controller_ingress_without_class_name` metric to `0` before the controller deletes the Ingress resource, so that the alert is deleted and no longer shows on the web console. (link:https://issues.redhat.com/browse/OCPBUGS-13181[*OCPBUGS-13181*])
      Show
      * Previously, for an Ingress resource with an `IngressWithoutClassName` alert, the Ingress Controller did not delete the alert along with deletion of the resource. The alert continued to show on the {product-title} web console. With this release, the Ingress Controller resets the `openshift_ingress_to_route_controller_ingress_without_class_name` metric to `0` before the controller deletes the Ingress resource, so that the alert is deleted and no longer shows on the web console. (link: https://issues.redhat.com/browse/OCPBUGS-13181 [* OCPBUGS-13181 *])
    • Bug Fix
    • Done
    • This is a legit bug in current versions. Customers have noticed it and care about it. Needs a priority re-evaluation. If truly minor then close it and end the false sense of hope that open bugs give.

      Description of problem:

      We have an OKD 4.12 cluster which has persistent and 
      increasing ingresswithoutclassname alerts with no ingresses normally 
      present in the cluster. I believe the ingresswithoutclassname being 
      counted is created as part of the ACME validation process managed by the
       cert-manager operator with it's openshift route addon which are torn down once the ACME validation is complete.

      Version-Release number of selected component (if applicable):

       4.12.0-0.okd-2023-04-16-041331

      How reproducible:

      seems very consistent. went away during an update but came back shortly after and continues to increase.

      Steps to Reproduce:

      1. create ingress w/o classname
      2. see counter increase
      3. delete classless ingress
      4. counter does not decrease.
      

      Additional info:

      https://github.com/openshift/cluster-ingress-operator/issues/912

            [OCPBUGS-13181] metric for ingresswithoutclassname does not decrease when classless ingresses cease to exist

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Important: OpenShift Container Platform 4.18.1 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:6122

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.18.1 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:6122

            Shudi Li added a comment -

            Marked it verified with 4.18.0-0.nightly-2024-11-22-231049, thanks.

            Shudi Li added a comment - Marked it verified with 4.18.0-0.nightly-2024-11-22-231049, thanks.

            Shudi Li added a comment -

            verified it with the pre-merged test, please refer to https://github.com/openshift/route-controller-manager/pull/49#issuecomment-2431568891 for more detail, thanks.

            Shudi Li added a comment - verified it with the pre-merged test, please refer to https://github.com/openshift/route-controller-manager/pull/49#issuecomment-2431568891 for more detail, thanks.

            Shudi Li added a comment - - edited

            The issue could be reproduced with 4.18.0-0.nightly-2024-09-21-014704

            1.
            % oc get clusterversion                                           
            NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
            version   4.18.0-0.nightly-2024-09-21-014704   True        False         6h45m   Cluster version is 4.18.0-0.nightly-2024-09-21-014704
            
            2. create pod/service on ns test
            % oc -n test get all
            Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
            NAME                READY   STATUS    RESTARTS   AGE
            pod/appach-server   1/1     Running   0          5h1m
            
            
            NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
            service/sec-apach2    ClusterIP   172.30.246.65   <none>        27443/TCP   5h1m
            service/unsec-apach   ClusterIP   172.30.157.72   <none>        28080/TCP   5h1m
            
            3.  create ingress ingress-reen1, ingress-pass1 and ingress-with-class, then delete ingress ingress-reen1
            % oc -n test get ingress
            NAME                 CLASS    HOSTS                                                   ADDRESS   PORTS     AGE
            ingress-pass1        <none>   reen1-test.sec.shudi-8a23.qe.devcluster.openshift.com             80, 443   4h54m
            ingress-with-class   mytest   foo.bar.com                                                       80        4h53m
            
            4. navigate the web console Observe >> Metrics, input openshift_ingress_to_route_controller_ingress_without_class_name
            the value for "ingress-reen1" was still 1

            Shudi Li added a comment - - edited The issue could be reproduced with 4.18.0-0.nightly-2024-09-21-014704 1. % oc get clusterversion                                            NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS version   4.18.0-0.nightly-2024-09-21-014704   True        False         6h45m   Cluster version is 4.18.0-0.nightly-2024-09-21-014704 2. create pod/service on ns test % oc -n test get all Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+ NAME                READY   STATUS    RESTARTS   AGE pod/appach-server   1/1     Running   0          5h1m NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE service/sec-apach2    ClusterIP   172.30.246.65   <none>        27443/TCP   5h1m service/unsec-apach   ClusterIP   172.30.157.72   <none>        28080/TCP   5h1m 3. create ingress ingress-reen1, ingress-pass1 and ingress-with-class, then delete ingress ingress-reen1 % oc -n test get ingress NAME                 CLASS    HOSTS                                                   ADDRESS   PORTS     AGE ingress-pass1        <none>   reen1-test.sec.shudi-8a23.qe.devcluster.openshift.com             80, 443   4h54m ingress-with-class   mytest   foo.bar.com                                                       80        4h53m 4. navigate the web console Observe >> Metrics, input openshift_ingress_to_route_controller_ingress_without_class_name the value for "ingress-reen1" was still 1

            Yang Wei added a comment -

            hi team any update on this issue?

            Yang Wei added a comment - hi team any update on this issue?

            I have the same Issue on a 4.14.3 cluster:

            ```

            oc get ingress -A
            No resources found

            ```

             

            It's a red hat internal cluster I can provide access or an must-gather.

            Robert Bohne added a comment - I have the same Issue on a 4.14.3 cluster: ``` oc get ingress -A No resources found ```   It's a red hat internal cluster I can provide access or an must-gather.

            We also observed that the alerts are not differentiating between different namespaces. If you have same-named ingress-resources across different namespaces the error triggers for just one of them, and also resolves after you corrected one of them, leaving you with potentially uncorrected ingress-resources.

            openshift_ingress_to_route_controller_ingress_without_class_name 

            does not contain a relevant namespace label for the target ingress, just a name. This might be the problem.

            That issue is being tracked as OCPBUGS-15253, which is currently in progress with a proposed fix linked.

            Miciah Masters added a comment - We also observed that the alerts are not differentiating between different namespaces. If you have same-named ingress-resources across different namespaces the error triggers for just one of them, and also resolves after you corrected one of them, leaving you with potentially uncorrected ingress-resources. openshift_ingress_to_route_controller_ingress_without_class_name does not contain a relevant namespace label for the target ingress, just a name. This might be the problem. That issue is being tracked as OCPBUGS-15253 , which is currently in progress with a proposed fix linked.

            We are also experiencing this on an OCP 4.12.34 cluster, and are happy to help with any investigation.

            For the original problem: On our cluster the alert is only resolved if the ingressClassName is added. Removing the ingress-resource results in a "ghost"-alert that is not closing because "openshift_ingress_to_route_controller_ingress_without_class_name" stays on 1.

            We also observed that the alerts are not differentiating between different namespaces. If you have same-named ingress-resources across different namespaces the error triggers for just one of them, and also resolves after you corrected one of them, leaving you with potentially uncorrected ingress-resources.

            openshift_ingress_to_route_controller_ingress_without_class_name 

            does not contain a relevant namespace label for the target ingress, just a name. This might be the problem.

             

            Michael Riedmann (Inactive) added a comment - We are also experiencing this on an OCP 4.12.34 cluster, and are happy to help with any investigation. For the original problem: On our cluster the alert is only resolved if the ingressClassName is added. Removing the ingress-resource results in a "ghost"-alert that is not closing because "openshift_ingress_to_route_controller_ingress_without_class_name" stays on 1. We also observed that the alerts are not differentiating between different namespaces. If you have same-named ingress-resources across different namespaces the error triggers for just one of them, and also resolves after you corrected one of them, leaving you with potentially uncorrected ingress-resources. openshift_ingress_to_route_controller_ingress_without_class_name does not contain a relevant namespace label for the target ingress, just a name. This might be the problem.  

            I did a few initial round of analysis on this, seems like the metric doesn't get updated even when the Ingress is deleted.

            » oc get ingress -A
            No resources found
            

            As the metrics endpoint still would return the following, despite these ingress(es) being already removed from the cluster:

            # HELP openshift_ingress_to_route_controller_ingress_without_class_name Report the number of ingresses that do not specify ingressClassName.
            # TYPE openshift_ingress_to_route_controller_ingress_without_class_name gauge
            openshift_ingress_to_route_controller_ingress_without_class_name{name="cm-acme-http-solver-2j8vk"} 1
            openshift_ingress_to_route_controller_ingress_without_class_name{name="cm-acme-http-solver-bkxsg"} 1
            openshift_ingress_to_route_controller_ingress_without_class_name{name="ingress-le-prod"} 1
            

            The expected behaviour would be to atleast set these gauge metric with specific name labels of the ingress to 0 once they're removed.

            Swarup Ghosh added a comment - I did a few initial round of analysis on this, seems like the metric doesn't get updated even when the Ingress is deleted. » oc get ingress -A No resources found As the metrics endpoint still would return the following, despite these ingress(es) being already removed from the cluster: # HELP openshift_ingress_to_route_controller_ingress_without_class_name Report the number of ingresses that do not specify ingressClassName. # TYPE openshift_ingress_to_route_controller_ingress_without_class_name gauge openshift_ingress_to_route_controller_ingress_without_class_name{name= "cm-acme-http-solver-2j8vk" } 1 openshift_ingress_to_route_controller_ingress_without_class_name{name= "cm-acme-http-solver-bkxsg" } 1 openshift_ingress_to_route_controller_ingress_without_class_name{name= "ingress-le-prod" } 1 The expected behaviour would be to atleast set these gauge metric with specific name labels of the ingress to 0 once they're removed.

            The metric is created in route-controller-manager, the alert is created in cluster-ingress-operator.  Need to determine where metric can be deleted to make this work.

             

            Candace Holman added a comment - The metric is created in route-controller-manager, the alert is created in cluster-ingress-operator.  Need to determine where metric can be deleted to make this work.  

              mmasters1@redhat.com Miciah Masters
              oit-nate.childers Nate Childers (Inactive)
              Shudi Li Shudi Li
              Darragh Fitzmaurice Darragh Fitzmaurice
              Votes:
              1 Vote for this issue
              Watchers:
              18 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 5 hours
                  5h