Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2775

After added/removed label from a namespace, one stats of "route_metrics_controller_routes_per_shard" in Observe >> Metrics page aren't correct

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • 4.12.0
    • 4.12
    • Networking / router
    • None
    • Moderate
    • None
    • Sprint 226, Sprint 227
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • NA

      Description of problem:

      There were 4 ingress-controllers and totally 15 routes. On web console, try to query "route_metrics_controller_routes_per_shard" in Observe >> Metrics page. the stats for 3 ingress-controllers are 15, and it is 1 for the last ingress-controller

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-2022-10-23-154914

      How reproducible:

      Create pods, services, ingress-controllers, routes, then check  "route_metrics_controller_routes_per_shard" on web console

      Steps to Reproduce:

      1. get cluster's base domain
      % oc get dnses.config/cluster -oyaml | grep -i domain
        baseDomain: shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      
      2. create 3 clusters
      % oc -n openshift-ingress-operator get ingresscontroller
      NAME         AGE
      default      7h5m
      extertest3   120m
      internal1    120m
      internal2    120m
      % 
      
      3. check the spec of the 4 ingress-controllres
      a, default
      
      b, extertest3
      spec:
        domain: extertest3.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
        endpointPublishingStrategy:
          loadBalancer:
            dnsManagementPolicy: Managed
            scope: External
          type: LoadBalancerService
      c, internal1
      spec:
        domain: internal1.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
        endpointPublishingStrategy:
          loadBalancer:
            dnsManagementPolicy: Managed
            scope: Internal
          type: LoadBalancerService
      d, internal2
      spec:
        domain: internal2.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
        endpointPublishingStrategy:
          loadBalancer:
            dnsManagementPolicy: Managed
            scope: Internal
          type: LoadBalancerService
        routeSelector:
          matchLabels:
            shard: alpha
      
      4. check the route, there are 15 routes
      % oc get route -A | awk '{print $3}'
      HOST/PORT
      oauth-openshift.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      console-openshift-console.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      downloads-openshift-console.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      canary-openshift-ingress-canary.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      alertmanager-main-openshift-monitoring.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      prometheus-k8s-openshift-monitoring.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      prometheus-k8s-federate-openshift-monitoring.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      thanos-querier-openshift-monitoring.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      edge1-test.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      int1reen2-test.internal1.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      pass1-test.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      reen1-test.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      service-unsecure-test.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      int1edge2-test.internal1.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      test.shudi.com
      %
      
      % oc get route -A | awk '{print $3}' | grep apps.shudi
      oauth-openshift.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      console-openshift-console.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      downloads-openshift-console.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      canary-openshift-ingress-canary.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      alertmanager-main-openshift-monitoring.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      prometheus-k8s-openshift-monitoring.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      prometheus-k8s-federate-openshift-monitoring.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      thanos-querier-openshift-monitoring.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      edge1-test.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      pass1-test.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      reen1-test.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      service-unsecure-test.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com
      %
      
      % oc get route -A | awk '{print $3}' | grep apps.shudi | wc -l
            12
      % oc get route -A | awk '{print $3}' | grep internal1 | wc -l 
             2
      % oc get route -A | awk '{print $3}' | grep shudi.com | wc -l
             1
      %
      
      5. only route unsvc5 had the shard=alpha label
       % oc get route unsvc5  -oyaml | grep labels: -A2
        labels:
          name: unsvc5
          shard: alpha
       % oc get route unsvc5 -oyaml | grep spec: -A1
        spec:
          host: test.shudi.com
      
      6. login web console(https://https://console-openshift-console.apps.shudi-412gcpop36.qe.gcp.devcluster.openshift.com/monitoring/query-browser), then navigate to Observe >> Metrics 
      
      7. input"route_metrics_controller_routes_per_shard ", then click the "Run queries" button. As the attached picture showed:
      โ€‹โ€‹name                           value
      default                        15
      extertest3                     15
      internal1                      15      
      internal2                      1
      
      8. Also there was a minor issue: As the attached picture showed, there were two name in the header line
      
      Name                                           name      value                              
      route_metrics_controller_routes_per_shard     default    15
      route_metrics_controller_routes_per_shard     extertest3 15
      route_metrics_controller_routes_per_shard     internal1  15
      route_metrics_controller_routes_per_shard     internal2  1

      Actual results:

      โ€‹โ€‹name                         value 
      default                      15
      extertest3                   15 
      internal1                    15
      internal2                    1

      Expected results:

      โ€‹โ€‹name                         value
      default                      12
      extertest3                   0
      internal1                    2 
      internal2                    1

      Additional info:

       

            [OCPBUGS-2775] After added/removed label from a namespace, one stats of "route_metrics_controller_routes_per_shard" in Observe >> Metrics page aren't correct

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory, and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2022:7399

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399

            This bug fix does not require a release note as the bug was in a new feature that has not shipped.

            Miciah Masters added a comment - This bug fix does not require a release note as the bug was in a new feature that has not shipped.

            Shudi Li added a comment -

            rh-ee-arsen Yes, I also noticed route Ingress status wasn't changed after the namespace label was changed. Let's wait the known issue fixed and this metric issue will gone automatically. Thanks.

            Shudi Li added a comment - rh-ee-arsen Yes, I also noticed route Ingress status wasn't changed after the namespace label was changed. Let's wait the known issue fixed and this metric issue will gone automatically. Thanks.

            Shudi Li added a comment -

            rh-ee-arsen yes, in step 8, the two names are for two different things. It would be great changing the "name" to "shard", thanks.

            Shudi Li added a comment - rh-ee-arsen yes, in step 8, the two names are for two different things. It would be great changing the "name" to "shard", thanks.

            shudili@redhat.com the issue with change of namespace label is a known one. This is due to the fact that the route Ingress status also does not get updated when the namespace label is changed. This is tracked in https://issues.redhat.com/browse/OCPBUGS-1689. The fix for this bug will also fix the issue of updating the metric value as the metric value is dependent on the route Ingress status.

            Arkadeep Sen added a comment - shudili@redhat.com the issue with change of namespace label is a known one. This is due to the fact that the route Ingress status also does not get updated when the namespace label is changed. This is tracked in https://issues.redhat.com/browse/OCPBUGS-1689 . The fix for this bug will also fix the issue of updating the metric value as the metric value is dependent on the route Ingress status.

            shudili@redhat.com in step 8 the first column gives the name of the metric, the rest of the columns apart from the last one gives the labels of the metric and the last column gives the corresponding value for the metric. As one label showing the name of the shard is called 'name', that's why 2 columns are having the same name. The label 'name' can be changed to 'shard' to avoid the confusion.

            Arkadeep Sen added a comment - shudili@redhat.com in step 8 the first column gives the name of the metric, the rest of the columns apart from the last one gives the labels of the metric and the last column gives the corresponding value for the metric. As one label showing the name of the shard is called 'name', that's why 2 columns are having the same name. The label 'name' can be changed to 'shard' to avoid the confusion.

            Shudi Li added a comment -

            I did the test of Namespace labelselector in an ingress-controller: After I add the label to a namespace(test), the route_metrics_controller_routes_per_shard stats for this ingress-controller will be increased as expected. But after I remove the label from the namespace(test), the stats isn't changed, which should be decreased by the number of routes in the namespace(test).

            1. There are totally 18 routes in all namespaces, namespace test has 6 routes(4 routes has "shard: alpha" label), namespace test2 has 4 routes(namespace test2 has "type: sharded" label)
            2. There are totally 4 ingress-controllers, ic internal2 has routeSelector("shard: alpha" label), ic extertest3 has namespaceSelector("type: sharded" label)
            3. check the stats of "route_metrics_controller_routes_per_shard", as shows in below as expected:
              name                           value
              default                      18
              extertest3                  4
              internal1                    18      
              internal2                    4
            4.  add "type: sharded" label to namespace test which has 6 routes, check the stats, it is also as expected
              name                           value
              default                      18
              extertest3                  10
              internal1                    18      
              internal2                    4
            5.  remove "type: sharded" label from namespace test, expect the stats for extertest3 changes back to 4, but it is still 10
            6. "type: sharded" label is removed from ns test

            % oc get namespace test -oyaml | grep labels: -A5

              labels:

                kubernetes.io/metadata.name: test

                pod-security.kubernetes.io/enforce: privileged

                pod-security.kubernetes.io/enforce-version: v1.24

                security.openshift.io/scc.podSecurityLabelSync: "false"

              name: test

            1.  check one route's status under ns test 

            % oc get route -n test --no-headers=true | wc -l

                   6

            %

            % oc get route -n test edge11 -oyaml
            ...

            status:

              ingress:

              - conditions:

                - lastTransitionTime: "2022-10-26T06:09:46Z"

                  status: "True"

                  type: Admitted

                host: edge11-test1.internal2.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerCanonicalHostname: router-default.apps.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerName: default

                wildcardPolicy: None

              - conditions:

                - lastTransitionTime: "2022-10-26T06:09:46Z"

                  status: "True"

                  type: Admitted

                host: edge11-test1.internal2.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerCanonicalHostname: router-internal1.internal1.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerName: internal1

                wildcardPolicy: None

              - conditions:

                - lastTransitionTime: "2022-10-26T06:14:15Z"

                  status: "True"

                  type: Admitted

                host: edge11-test1.internal2.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerCanonicalHostname: router-internal2.internal2.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerName: internal2

                wildcardPolicy: None

              - conditions:

                - lastTransitionTime: "2022-10-26T07:06:10Z"

                  status: "True"

                  type: Admitted

                host: edge11-test1.internal2.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerCanonicalHostname: router-extertest3.extertest3.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerName: extertest3

                wildcardPolicy: None

            %

            1. check the stats again
              name                           value
              default                      18
              extertest3                  10
              internal1                    18      
              internal2                    4

            Shudi Li added a comment - I did the test of Namespace labelselector in an ingress-controller: After I add the label to a namespace(test), the route_metrics_controller_routes_per_shard stats for this ingress-controller will be increased as expected. But after I remove the label from the namespace(test), the stats isn't changed, which should be decreased by the number of routes in the namespace(test). There are totally 18 routes in all namespaces, namespace test has 6 routes(4 routes has "shard: alpha" label), namespace test2 has 4 routes(namespace test2 has "type: sharded" label) There are totally 4 ingress-controllers, ic internal2 has routeSelector("shard: alpha" label), ic extertest3 has namespaceSelector("type: sharded" label) check the stats of "route_metrics_controller_routes_per_shard", as shows in below as expected: name                           value default                      18 extertest3                  4 internal1                    18       internal2                    4  add "type: sharded" label to namespace test which has 6 routes, check the stats, it is also as expected name                           value default                      18 extertest3                  10 internal1                    18       internal2                    4  remove "type: sharded" label from namespace test, expect the stats for extertest3 changes back to 4, but it is still 10 "type: sharded" label is removed from ns test % oc get namespace test -oyaml | grep labels: -A5   labels:     kubernetes.io/metadata.name: test     pod-security.kubernetes.io/enforce: privileged     pod-security.kubernetes.io/enforce-version: v1.24     security.openshift.io/scc.podSecurityLabelSync: "false"   name: test %   check one route's status under ns test  % oc get route -n test --no-headers=true | wc -l        6 % % oc get route -n test edge11 -oyaml ... status:   ingress:   - conditions:     - lastTransitionTime: "2022-10-26T06:09:46Z"       status: "True"       type: Admitted     host: edge11-test1.internal2.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerCanonicalHostname: router-default.apps.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerName: default     wildcardPolicy: None   - conditions:     - lastTransitionTime: "2022-10-26T06:09:46Z"       status: "True"       type: Admitted     host: edge11-test1.internal2.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerCanonicalHostname: router-internal1.internal1.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerName: internal1     wildcardPolicy: None   - conditions:     - lastTransitionTime: "2022-10-26T06:14:15Z"       status: "True"       type: Admitted     host: edge11-test1.internal2.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerCanonicalHostname: router-internal2.internal2.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerName: internal2     wildcardPolicy: None   - conditions:     - lastTransitionTime: "2022-10-26T07:06:10Z"       status: "True"       type: Admitted     host: edge11-test1.internal2.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerCanonicalHostname: router-extertest3.extertest3.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerName: extertest3     wildcardPolicy: None % check the stats again name                           value default                      18 extertest3                  10 internal1                    18       internal2                    4

            Shudi Li added a comment -

            rh-ee-arsen From the route's status, the stats of "route_metrics_controller_routes_per_shard" were correct, thanks again for the explanation. So please take a look at step 8 if it would be necessary to fix the minor name issue, thanks!
            8. Also there was a minor issue: As the attached picture showed, there were two name in the header line

            Name name value
            route_metrics_controller_routes_per_shard   default 15
            route_metrics_controller_routes_per_shard   extertest3 15
            route_metrics_controller_routes_per_shard   internal1 15
            route_metrics_controller_routes_per_shard   internal2 1

            Shudi Li added a comment - rh-ee-arsen From the route's status, the stats of "route_metrics_controller_routes_per_shard" were correct, thanks again for the explanation. So please take a look at step 8 if it would be necessary to fix the minor name issue, thanks! 8. Also there was a minor issue: As the attached picture showed, there were two name in the header line Name name value route_metrics_controller_routes_per_shard   default 15 route_metrics_controller_routes_per_shard   extertest3 15 route_metrics_controller_routes_per_shard   internal1 15 route_metrics_controller_routes_per_shard   internal2 1

            Shudi Li added a comment -

            rh-ee-arsen Thanks for the explanation. I had thought cluster's domain would work for the routes.

            I just create two custom ingress-controllers and two routes, and put one route's status here

            status:

              ingress:

              - conditions:

                - lastTransitionTime: "2022-10-26T02:24:04Z"

                  status: "True"

                  type: Admitted

                host: int1edge1-test.internal1.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerCanonicalHostname: router-internal2.internal2.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerName: internal2

                wildcardPolicy: None

              - conditions:

                - lastTransitionTime: "2022-10-26T02:24:04Z"

                  status: "True"

                  type: Admitted

                host: int1edge1-test.internal1.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerCanonicalHostname: router-default.apps.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerName: default

                wildcardPolicy: None

              - conditions:

                - lastTransitionTime: "2022-10-26T02:24:04Z"

                  status: "True"

                  type: Admitted

                host: int1edge1-test.internal1.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerCanonicalHostname: router-internal1.internal1.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com

                routerName: internal1

                wildcardPolicy: None

            Shudi Li added a comment - rh-ee-arsen Thanks for the explanation. I had thought cluster's domain would work for the routes. I just create two custom ingress-controllers and two routes, and put one route's status here status:   ingress:   - conditions:     - lastTransitionTime: "2022-10-26T02:24:04Z"       status: "True"       type: Admitted     host: int1edge1-test.internal1.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerCanonicalHostname: router-internal2.internal2.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerName: internal2     wildcardPolicy: None   - conditions:     - lastTransitionTime: "2022-10-26T02:24:04Z"       status: "True"       type: Admitted     host: int1edge1-test.internal1.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerCanonicalHostname: router-default.apps.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerName: default     wildcardPolicy: None   - conditions:     - lastTransitionTime: "2022-10-26T02:24:04Z"       status: "True"       type: Admitted     host: int1edge1-test.internal1.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerCanonicalHostname: router-internal1.internal1.shudi-412gcpkf2.qe.gcp.devcluster.openshift.com     routerName: internal1     wildcardPolicy: None

            Shudi Li added a comment -

            mmasters1@redhat.com Yes, I didn't configure any selectors for extertest3, and internal1.

            Shudi Li added a comment - mmasters1@redhat.com Yes, I didn't configure any selectors for extertest3, and internal1.

              rh-ee-arsen Arkadeep Sen
              shudili@redhat.com Shudi Li
              Shudi Li Shudi Li
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: