Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-2919

CLO is constantly failing to create already existing logging objects (HTTP 409)

    • False
    • None
    • False
    • NEW
    • VERIFIED
    • Hide
      Before this update, the Operators general pattern for reconciling resources was to try and create before attempting to get or update which would lead to constant HTTP 409 responses after creation. With this update, Operators first attempt to retrieve an object and only create or update it if it is either missing or not as specified.
      Show
      Before this update, the Operators general pattern for reconciling resources was to try and create before attempting to get or update which would lead to constant HTTP 409 responses after creation. With this update, Operators first attempt to retrieve an object and only create or update it if it is either missing or not as specified.
    • Log Collection - Sprint 226, Log Collection - Sprint 227

      While looking into an API issue, we found CLO constantly trying to create / recreate its objects causing a large number of HTTP 409 errors against the API.

      From the API logs, we are seeing around 7500 failures per hour in a small lab cluster.

      kubectl-dev_tool audit -f ./kube-apiserver --by resource --user=system:serviceaccount:openshift-logging:cluster-logging-operator --failed-only -otop
      count: 14634, first: 2022-08-09T14:47:06-04:00, last: 2022-08-09T16:42:51-04:00, duration: 1h55m45.181771s
      3191x                v1/configmaps
      1504x                monitoring.coreos.com/prometheusrules
      1504x                v1/services
      1504x                monitoring.coreos.com/servicemonitors
      935x                 v1/serviceaccounts
      752x                 apps/daemonsets
      752x                 rbac.authorization.k8s.io/roles
      752x                 security.openshift.io/v1/securitycontextconstraints
      752x                 rbac.authorization.k8s.io/v1/clusterrolebindings
      752x                 scheduling.k8s.io/v1/priorityclasses 

      Looking at the actual failed requests, its all http 409s trying to create things that already exist:

      kubectl-dev_tool audit -f ./kube-apiserver --by resource --user=system:serviceaccount:openshift-logging:cluster-logging-operator --failed-only
      had 1115196 line read failures
      18:47:06 [CREATE][     7.088ms] [409] /apis/scheduling.k8s.io/v1/priorityclasses                                          [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     7.883ms] [409] /api/v1/namespaces/openshift-logging/serviceaccounts                                [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     8.077ms] [409] /apis/security.openshift.io/v1/securitycontextconstraints                           [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     8.343ms] [409] /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/roles               [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][    12.699ms] [409] /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/rolebindings        [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     8.438ms] [409] /apis/rbac.authorization.k8s.io/v1/clusterroles                                     [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     8.497ms] [409] /apis/rbac.authorization.k8s.io/v1/clusterrolebindings                              [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][    16.374ms] [409] /api/v1/namespaces/openshift-logging/services                                       [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     7.865ms] [409] /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors         [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     11.21ms] [409] /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/prometheusrules         [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][    10.356ms] [409] /api/v1/namespaces/openshift-logging/configmaps                                     [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     6.324ms] [409] /api/v1/namespaces/openshift-logging/configmaps                                     [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [DELETE][     1.694ms] [404] /api/v1/namespaces/openshift-logging/services/fluentd                               [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [DELETE][     2.167ms] [404] /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors/fluentd [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [DELETE][     2.032ms] [404] /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/prometheusrules/fluentd [system:serviceaccount:openshift-logging:cluster-logging-operator] 

            [LOG-2919] CLO is constantly failing to create already existing logging objects (HTTP 409)

            Verified using cluster-logging.v5.6.0 . 

            Qiaoling Tang added a comment - Verified using cluster-logging.v5.6.0 . 

            CPaaS Service Account mentioned this issue in merge request !248 of openshift-logging / Log Collection Midstream on branch openshift-logging-5.6-rhel-8_upstream_bd89702a0be8fc97812be8b73431ea96:

            Updated US source to: a25ac97 Merge pull request #1701 from jcantrill/log2789

            GitLab CEE Bot added a comment - CPaaS Service Account mentioned this issue in merge request !248 of openshift-logging / Log Collection Midstream on branch openshift-logging-5.6-rhel-8_ upstream _bd89702a0be8fc97812be8b73431ea96 : Updated US source to: a25ac97 Merge pull request #1701 from jcantrill/log2789

            Some issues you may be seeing are already reported in LOG-3049. rojacob@redhat.com just recently discovered the issue for LOG-3049

            Jeffrey Cantrill added a comment - Some issues you may be seeing are already reported in LOG-3049 . rojacob@redhat.com just recently discovered the issue for LOG-3049

            GitLab CEE Bot added a comment - CPaaS Service Account mentioned this issue in merge request !135 of openshift-logging / Log Collection Midstream on branch openshift-logging-5.6-rhel-8_ upstream _d25d5ee8d88291462ccef1912b6d2450 : Updated 2 upstream sources

            GitLab CEE Bot added a comment - CPaaS Service Account mentioned this issue in merge request !91 of openshift-logging / Log Collection Midstream on branch openshift-logging-5.6-rhel-8_ upstream _bdc48bb458b8eafbbe80de10170aff4d : Updated 6 upstream sources

            Matt Robson added a comment -

            I will confirm once I get new data and let you know.

            Matt Robson added a comment - I will confirm once I get new data and let you know.

            If you consider it resolved with the upgrade I would proposing we close this as fixed in the next release. We are unlikely to fix in 5.4

            Jeffrey Cantrill added a comment - If you consider it resolved with the upgrade I would proposing we close this as fixed in the next release. We are unlikely to fix in 5.4

            rhn-support-mrobson this is partially because of the "strategy" for object reconciliation. With the release of 5.5 we have moved to a "watch" from a 30s "periodic" poll which should alleviate part of this issue. Is there anyway you might be able to confirm if there is an improvement in 5.5?

            Jeffrey Cantrill added a comment - rhn-support-mrobson this is partially because of the "strategy" for object reconciliation. With the release of 5.5 we have moved to a "watch" from a 30s "periodic" poll which should alleviate part of this issue. Is there anyway you might be able to confirm if there is an improvement in 5.5?

              jcantril@redhat.com Jeffrey Cantrill
              rhn-support-mrobson Matt Robson
              Qiaoling Tang Qiaoling Tang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: