Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-2919

CLO is constantly failing to create already existing logging objects (HTTP 409)

    XMLWordPrintable

Details

    • False
    • None
    • False
    • NEW
    • VERIFIED
    • Hide
      Before this update, the Operators general pattern for reconciling resources was to try and create before attempting to get or update which would lead to constant HTTP 409 responses after creation. With this update, Operators first attempt to retrieve an object and only create or update it if it is either missing or not as specified.
      Show
      Before this update, the Operators general pattern for reconciling resources was to try and create before attempting to get or update which would lead to constant HTTP 409 responses after creation. With this update, Operators first attempt to retrieve an object and only create or update it if it is either missing or not as specified.
    • Log Collection - Sprint 226, Log Collection - Sprint 227

    Description

      While looking into an API issue, we found CLO constantly trying to create / recreate its objects causing a large number of HTTP 409 errors against the API.

      From the API logs, we are seeing around 7500 failures per hour in a small lab cluster.

      kubectl-dev_tool audit -f ./kube-apiserver --by resource --user=system:serviceaccount:openshift-logging:cluster-logging-operator --failed-only -otop
      count: 14634, first: 2022-08-09T14:47:06-04:00, last: 2022-08-09T16:42:51-04:00, duration: 1h55m45.181771s
      3191x                v1/configmaps
      1504x                monitoring.coreos.com/prometheusrules
      1504x                v1/services
      1504x                monitoring.coreos.com/servicemonitors
      935x                 v1/serviceaccounts
      752x                 apps/daemonsets
      752x                 rbac.authorization.k8s.io/roles
      752x                 security.openshift.io/v1/securitycontextconstraints
      752x                 rbac.authorization.k8s.io/v1/clusterrolebindings
      752x                 scheduling.k8s.io/v1/priorityclasses 

      Looking at the actual failed requests, its all http 409s trying to create things that already exist:

      kubectl-dev_tool audit -f ./kube-apiserver --by resource --user=system:serviceaccount:openshift-logging:cluster-logging-operator --failed-only
      had 1115196 line read failures
      18:47:06 [CREATE][     7.088ms] [409] /apis/scheduling.k8s.io/v1/priorityclasses                                          [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     7.883ms] [409] /api/v1/namespaces/openshift-logging/serviceaccounts                                [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     8.077ms] [409] /apis/security.openshift.io/v1/securitycontextconstraints                           [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     8.343ms] [409] /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/roles               [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][    12.699ms] [409] /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/rolebindings        [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     8.438ms] [409] /apis/rbac.authorization.k8s.io/v1/clusterroles                                     [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     8.497ms] [409] /apis/rbac.authorization.k8s.io/v1/clusterrolebindings                              [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][    16.374ms] [409] /api/v1/namespaces/openshift-logging/services                                       [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     7.865ms] [409] /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors         [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     11.21ms] [409] /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/prometheusrules         [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][    10.356ms] [409] /api/v1/namespaces/openshift-logging/configmaps                                     [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     6.324ms] [409] /api/v1/namespaces/openshift-logging/configmaps                                     [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [DELETE][     1.694ms] [404] /api/v1/namespaces/openshift-logging/services/fluentd                               [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [DELETE][     2.167ms] [404] /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors/fluentd [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [DELETE][     2.032ms] [404] /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/prometheusrules/fluentd [system:serviceaccount:openshift-logging:cluster-logging-operator] 

      Attachments

        Activity

          People

            jcantril@redhat.com Jeffrey Cantrill
            rhn-support-mrobson Matt Robson
            Qiaoling Tang Qiaoling Tang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: