Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37300

File Integrity Operator is constantly force-initializing AIDE db

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.13.z
    • None
    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      On one cluster, customer is seeing that the "aide-openshift-file-integrity-daemon" Pod is constantly force-initialzing th AIDE db:

      2024-07-19T07:11:24Z: force-initializing AIDE db
      2024-07-19T07:11:49Z: initialization finished
      2024-07-19T07:16:32Z: force-initializing AIDE db
      2024-07-19T07:16:57Z: initialization finished
      2024-07-19T07:21:00Z: force-initializing AIDE db
      2024-07-19T07:21:25Z: initialization finished
      2024-07-19T07:21:39Z: force-initializing AIDE db
      2024-07-19T07:22:03Z: initialization finished
      2024-07-19T07:26:45Z: force-initializing AIDE db
      2024-07-19T07:27:09Z: initialization finished
      2024-07-19T07:31:50Z: force-initializing AIDE db
      2024-07-19T07:32:15Z: initialization finished
      2024-07-19T07:36:19Z: running aide check
      2024-07-19T07:36:44Z: aide check returned status 0
      2024-07-19T07:36:56Z: force-initializing AIDE db
      2024-07-19T07:37:21Z: initialization finished
      2024-07-19T07:37:40Z: force-initializing AIDE db
      2024-07-19T07:38:05Z: initialization finished
      2024-07-19T07:42:04Z: force-initializing AIDE db

      In the Operator Pod, similar logs to the following can be seen:

      {"level":"info","ts":"2024-07-19T07:11:21Z","logger":"controller_node","msg":"Added Reinit annotation to FileIntegrity for node","Node.Name":"example-master-0","node":"gp-b2c-we-bm7bg-master-0","fi":"openshift-file-integrity"}
      {"level":"info","ts":"2024-07-19T07:11:21Z","logger":"controller_status","msg":"reconciling FileIntegrityStatus","Request.Namespace":"openshift-file-integrity","Request.Name":"openshift-file-integrity"}
      {"level":"info","ts":"2024-07-19T07:11:21Z","logger":"controller_fileintegrity","msg":"reconciling FileIntegrity","Request.Namespace":"openshift-file-integrity","Request.Name":"openshift-file-integrity"}
      {"level":"info","ts":"2024-07-19T07:11:21Z","logger":"controller_fileintegrity","msg":"re-init daemonSet created, triggered by demand or nodes","Request.Namespace":"openshift-file-integrity","Request.Name":"openshift-file-integrity","nodes":"example-master-0"}
      {"level":"info","ts":"2024-07-19T07:11:21Z","logger":"controller_fileintegrity","msg":"Annotating AIDE config to be updated.","Request.Namespace":"openshift-file-integrity","Request.Name":"openshift-file-integrity"}
      [..]
      {"level":"info","ts":"2024-07-19T07:11:21Z","logger":"controller_fileintegrity","msg":"Re-init annotation failed to be removed, re-queueing","Request.Namespace":"openshift-file-integrity","Request.Name":"openshift-file-integrity"}
      

      Reviewing the messages, it seems that some messages are more common than others:

      $ cat file-integrity-operator-658c566ff7-49mn4-file-integrity-operator.log | jq '.msg' | sort | uniq -c | sort -n
            2 "FileIntegrity daemon configuration changed - pods restarted."
            2 "FileIntegrity needed DaemonSet command-line arguments update"
            9 "error updating FileIntegrity status"
           12 "Reconciler error"
          401 "Added Reinit annotation to FileIntegrity for node"
          401 "Removed Holdoff annotation from FileIntegrity for node"
          522 "Re-init annotation failed to be removed, re-queueing"
          797 "re-init daemonSet created, triggered by demand or nodes"
          811 "Updating status"
         1325 "Removing re-init annotation."
         1328 "Annotating AIDE config to be updated."
         2402 "Will reconcile FI because its config changed"
         2650 "reconciling FileIntegrity"
         4126 "reconciling re-init"
         7438 "Reconciling ConfigMap"
         7653 "Node is up-to-date. Degraded: false"
         7659 "Reconciling Node"
        16645 "reconciling FileIntegrityStatus"

      FileIntegrity object is configured like this:

      spec:
        config:
          gracePeriod: 900
          initialDelay: 60
          maxBackups: 5
        tolerations:
        - operator: Exists

      GitOps and RHACM have been disabled for any FIO objects, so they should not be interfering.

      Version-Release number of selected component (if applicable):

      OpenShift Container Platform 4.13
      File Integrity Operator Version 1.3.4

      How reproducible:

      On one cluster on customer side

      Steps to Reproduce:

      1. Set up File Integrity Operator 1.3.4
      2. Configure File Integrity Operator with FileIntegrity object above

      Actual results:

      AIDE db is re-initialized less than once per hour

      Expected results:

      AIDE db is re-initialized every few minutes

      Additional info:

              wenshen@redhat.com Vincent Shen
              rhn-support-skrenger Simon Krenger
              Xiaojie Yuan Xiaojie Yuan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: