Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-8502

File Integrity Operator marks newly added node as failed

XMLWordPrintable

    • ?
    • Moderate
    • No
    • 2
    • CMP Sprint 62
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, FIO wouldn't cleanup node status CRDs when nodes were removed from the cluster. Additionally, it would flag new nodes as failing integrity checks.

      FIO has been updated to gracefully handle scaling down and adding new nodes to the cluster, resulting in more accurate node status notifications.
      Show
      Previously, FIO wouldn't cleanup node status CRDs when nodes were removed from the cluster. Additionally, it would flag new nodes as failing integrity checks. FIO has been updated to gracefully handle scaling down and adding new nodes to the cluster, resulting in more accurate node status notifications.
    • Bug Fix

      Description of problem:

      When a new node joins the cluster the File Integrity Operator marks that new node as failed.
      
      In the following example, ip-10-0-215-199.ap-south-1.compute.internal is the new node which was marked as failed just after it was added to the ileintegritynodestatuses object.
      
      worker-fileintegrity-ip-10-0-192-16.ap-south-1.compute.internal    ip-10-0-192-16.ap-south-1.compute.internal    Succeeded 
      
      worker-fileintegrity-ip-10-0-212-37.ap-south-1.compute.internal    ip-10-0-212-37.ap-south-1.compute.internal    Succeeded 
      
      worker-fileintegrity-ip-10-0-215-199.ap-south-1.compute.internal   ip-10-0-215-199.ap-south-1.compute.internal   Failed 
      
      worker-fileintegrity-ip-10-0-219-207.ap-south-1.compute.internal   ip-10-0-219-207.ap-south-1.compute.internal   Succeeded worker-fileintegrity-
      
      
      $ oc describe cm aide-worker-fileintegrity-ip-10-0-215-199.ap-south-1.compute.internal-failed
      Name:         aide-worker-fileintegrity-ip-10-0-215-199.ap-south-1.compute.internal-failed
      Namespace:    openshift-file-integrity
      Labels:       file-integrity.openshift.io/node=ip-10-0-215-199.ap-south-1.compute.internal
                    file-integrity.openshift.io/owner=worker-fileintegrity
                    file-integrity.openshift.io/result-log=
      Annotations:  file-integrity.openshift.io/files-added: 0
                    file-integrity.openshift.io/files-changed: 1
                    file-integrity.openshift.io/files-removed: 0
      
      
      Data
      ====
      integritylog:
      ----
      Start timestamp: 2023-03-07 14:41:04 +0000 (AIDE 0.16)
      AIDE found differences between database and filesystem!!
      
      Summary:
        Total number of entries:  35786
        Added entries:              0
        Removed entries:            0
        Changed entries:            1
      
      ---------------------------------------------------
      Changed entries:
      ---------------------------------------------------
      d   ...    n ... : /hostroot/etc/kubernetes/cni/net.d
      
      ---------------------------------------------------
      Detailed information about changes:
      ---------------------------------------------------
      Directory: /hostroot/etc/kubernetes/cni/net.d
        Linkcount: 3                                | 4
      ---------------------------------------------------
      The attributes of the (uncompressed) database(s):
      ---------------------------------------------------
      /hostroot/etc/kubernetes/aide.db.gz
        MD5      : 0aTQE8sSCOSHo4ddbgVY5g==
        SHA1     : K5sPGNp7Zysk7VWpoQHzxePIou0=
        RMD160   : 2CyRr7Nerz8qDKHzNv47hMSC9uc=
        TIGER    : o7VhOUH2xPXEmKVHEtG6U/blzAe/ezsU
        SHA256   : ree5Z5+mYlJDRSHUxbq4Vefrz1VBxca4
                   F2sCgQyZT28=
        SHA512   : tRSTBNKK+drvLNY5ZamDgLBxdvRJej1R
                   0Kh1NKW3Iemj0Ks+avlyTlKBEQi84tdD
                   FsSvFeURCQdeLDAmkw+mNA=
      
      End timestamp: 2023-03-07 14:41:33 +0000 (run time: 0m 29s)
      
      BinaryData
      ====
      Events:  <none
      
      
      
      
      

      Version-Release number of selected component (if applicable):

      $ oc get csv
      NAME                             DISPLAY                   VERSION   REPLACES   PHASE
      file-integrity-operator.v1.0.0   File Integrity Operator   1.0.0                Succeeded
      
      $ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.12.4    True        False         6h39m   Cluster version is 4.12.4

      How reproducible:

      - Install file-integrity-operator on OCP cluster. 
      
      - Create the FileIntegrity custom resource as described in the documentation[1] 
      
      - Wait for the worker nodes to appear in 'fileintegritynodestatuses' and once all worker nodes are added to the 'fileintegritynodestatuses' object, create a new worker node through the machineset or manually. 
      
      - Observe that after a few minutes the newly joined node will be marked as failed. 
        $ oc get fileintegritynodestatuses.fileintegrity.openshift.io 
      
      
      [1] https://docs.openshift.com/container-platform/4.12/security/file_integrity_operator/file-integrity-operator-understanding.html

      Steps to Reproduce:

      Actual results:

       

      Expected results:

       

      Additional info:

       

            wenshen@redhat.com Vincent Shen
            rhn-support-rsahoo Ramesh Sahoo
            Xiaojie Yuan Xiaojie Yuan
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: