Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-2919

CLO is constantly failing to create already existing logging objects (HTTP 409)

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • VERIFIED
    • Hide
      Before this update, the Operators general pattern for reconciling resources was to try and create before attempting to get or update which would lead to constant HTTP 409 responses after creation. With this update, Operators first attempt to retrieve an object and only create or update it if it is either missing or not as specified.
      Show
      Before this update, the Operators general pattern for reconciling resources was to try and create before attempting to get or update which would lead to constant HTTP 409 responses after creation. With this update, Operators first attempt to retrieve an object and only create or update it if it is either missing or not as specified.
    • Log Collection - Sprint 226, Log Collection - Sprint 227

      While looking into an API issue, we found CLO constantly trying to create / recreate its objects causing a large number of HTTP 409 errors against the API.

      From the API logs, we are seeing around 7500 failures per hour in a small lab cluster.

      kubectl-dev_tool audit -f ./kube-apiserver --by resource --user=system:serviceaccount:openshift-logging:cluster-logging-operator --failed-only -otop
      count: 14634, first: 2022-08-09T14:47:06-04:00, last: 2022-08-09T16:42:51-04:00, duration: 1h55m45.181771s
      3191x                v1/configmaps
      1504x                monitoring.coreos.com/prometheusrules
      1504x                v1/services
      1504x                monitoring.coreos.com/servicemonitors
      935x                 v1/serviceaccounts
      752x                 apps/daemonsets
      752x                 rbac.authorization.k8s.io/roles
      752x                 security.openshift.io/v1/securitycontextconstraints
      752x                 rbac.authorization.k8s.io/v1/clusterrolebindings
      752x                 scheduling.k8s.io/v1/priorityclasses 

      Looking at the actual failed requests, its all http 409s trying to create things that already exist:

      kubectl-dev_tool audit -f ./kube-apiserver --by resource --user=system:serviceaccount:openshift-logging:cluster-logging-operator --failed-only
      had 1115196 line read failures
      18:47:06 [CREATE][     7.088ms] [409] /apis/scheduling.k8s.io/v1/priorityclasses                                          [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     7.883ms] [409] /api/v1/namespaces/openshift-logging/serviceaccounts                                [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     8.077ms] [409] /apis/security.openshift.io/v1/securitycontextconstraints                           [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     8.343ms] [409] /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/roles               [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][    12.699ms] [409] /apis/rbac.authorization.k8s.io/v1/namespaces/openshift-logging/rolebindings        [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     8.438ms] [409] /apis/rbac.authorization.k8s.io/v1/clusterroles                                     [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     8.497ms] [409] /apis/rbac.authorization.k8s.io/v1/clusterrolebindings                              [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][    16.374ms] [409] /api/v1/namespaces/openshift-logging/services                                       [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     7.865ms] [409] /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors         [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     11.21ms] [409] /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/prometheusrules         [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][    10.356ms] [409] /api/v1/namespaces/openshift-logging/configmaps                                     [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [CREATE][     6.324ms] [409] /api/v1/namespaces/openshift-logging/configmaps                                     [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [DELETE][     1.694ms] [404] /api/v1/namespaces/openshift-logging/services/fluentd                               [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [DELETE][     2.167ms] [404] /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors/fluentd [system:serviceaccount:openshift-logging:cluster-logging-operator]
      18:47:06 [DELETE][     2.032ms] [404] /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/prometheusrules/fluentd [system:serviceaccount:openshift-logging:cluster-logging-operator] 

              jcantril@redhat.com Jeffrey Cantrill
              rhn-support-mrobson Matt Robson
              Qiaoling Tang Qiaoling Tang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: