Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-3733

Setting "collection: {}" field in ClusterLogging crashes the Operator

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • NEW
    • Before this update, if the `collection` field contained `{}` it could result in the Operator crashing. With this update, the Operator will ignore this value, allowing the operator to continue running smoothly without interruption.
    • Log Collection - Sprint 233
    • Moderate

      Description of problem:

      When setting the following for "collection" in "ClusterLogging", the Cluster Logging Operator crashes:
      
      ~~~
      apiVersion: logging.openshift.io/v1
      kind: ClusterLogging
      metadata:
        annotations:
          logging.openshift.io/preview-vector-collector: enabled
        name: instance
        namespace: openshift-logging
      spec:
      [..]
        collection: {}
      [..]
      ~~~
      
      The Operator then crashes with the following stack trace:
      
      ~~~
      {"_ts":"2023-03-03T12:38:30.51672121Z","_level":"0","_component":"cluster-logging-operator","_message":"starting up...","go_arch":"amd64","go_os":"linux","go_version":"go1.18.7","operator_version":"5.6"}
      I0303 12:38:31.568169 1 request.go:665] Waited for 1.038650902s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/monitoring.coreos.com/v1?timeout=32s
      {"_ts":"2023-03-03T12:38:33.476748674Z","_level":"0","_component":"cluster-logging-operator","_message":"migrating resources provided by the manifest"}
      {"_ts":"2023-03-03T12:38:33.48144877Z","_level":"0","_component":"cluster-logging-operator","_message":"Registering Components."}
      {"_ts":"2023-03-03T12:38:33.48156936Z","_level":"0","_component":"cluster-logging-operator","_message":"Starting the Cmd."}
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1426670]
      goroutine 580 [running]:
      github.com/openshift/cluster-logging-operator/internal/migrations.MigrateCollectionSpec({{0xc000592610, 0x7}, 0xc0003a05f0, 0xc00026d7d0, 0xc0003a0640, 0x0, 0x0})
      /remote-source/cluster-logging-operator/app/internal/migrations/migrate_collection.go:28 +0x2d0
      github.com/openshift/cluster-logging-operator/internal/k8shandler.(*ClusterLoggingRequest).getClusterLogging(0xc000952300, 0x0)
      /remote-source/cluster-logging-operator/app/internal/k8shandler/reconciler.go:218 +0x17d
      github.com/openshift/cluster-logging-operator/internal/k8shandler.Reconcile(0xc0004285a0, {0x19fa270, 0xc0000bec30}, {0x7fd9b0687628, 0xc00017a1c0}, {0x19f47e8, 0xc0002e0040}, {0xc000432e99, 0x7}, {0xc000628690, ...})
      /remote-source/cluster-logging-operator/app/internal/k8shandler/reconciler.go:47 +0x20b
      github.com/openshift/cluster-logging-operator/controllers/clusterlogging.(*ReconcileClusterLogging).Reconcile(0xc0002bb200, {0x19f5348, 0xc00026c0f0}, {{{0xc00038dce0?, 0x16b4fe0?}, {0xc0005925d0?, 0x30?}}})
      /remote-source/cluster-logging-operator/app/controllers/clusterlogging/clusterlogging_controller.go:88 +0x3c9
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc00077a000, {0x19f5348, 0xc00026c0c0}, {{{0xc00038dce0?, 0x16b4fe0?}, {0xc0005925d0?, 0x409514?}}})
      /remote-source/cluster-logging-operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114 +0x27e
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00077a000, {0x19f52a0, 0xc0000ba280}, {0x15f4900?, 0xc0006bc040?})
      /remote-source/cluster-logging-operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311 +0x349
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00077a000, {0x19f52a0, 0xc0000ba280})
      /remote-source/cluster-logging-operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266 +0x1d9
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
      /remote-source/cluster-logging-operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227 +0x85
      created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
      /remote-source/cluster-logging-operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:223 +0x31c
      ~~~

      Version-Release number of selected component (if applicable):

      * OpenShift Container Platform 4.11.21
      * cluster-logging.v5.6.2

      How reproducible:

      Always

      Steps to Reproduce:

      1. Deploy Cluster Logging 5.6.2
      2. Configure the ClusterLogging CRD with a proper configuration, use the ClusterLogging definition from the documentation for example
      3. After the stack has been deployed successfully, change the "collection" field to "{}"
      

      Actual results:

      Operator crashes with "panic: runtime error: invalid memory address or nil pointer dereference"

      Expected results:

      Operator ignores the invalid configuration or warns about it

      Additional info:

      * Customer noticed this when trying out Vector and wanted to remove his fluentd Pods first. Workaround is to completely remove the "collection" field, then the Pods are removed as expected.

              vparfono Vitalii Parfonov
              rhn-support-skrenger Simon Krenger
              Anping Li Anping Li
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: