[OCPBUGS-8502] File Integrity Operator marks newly added node as failed - Red Hat Issue Tracker

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: None
Affects Version/s: 4.12, 4.11, 4.10
Component/s: File Integrity Operator
Labels:
- tc-approved

Test Coverage:

?
Severity:
Moderate
Regression:
No
Epic Link:
fio-1.3.0-bugs
Story Points:
2
Sprint:
CMP Sprint 62
sprint_count:
1
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
Previously, FIO wouldn't cleanup node status CRDs when nodes were removed from the cluster. Additionally, it would flag new nodes as failing integrity checks.

FIO has been updated to gracefully handle scaling down and adding new nodes to the cluster, resulting in more accurate node status notifications.

Show
Previously, FIO wouldn't cleanup node status CRDs when nodes were removed from the cluster. Additionally, it would flag new nodes as failing integrity checks. FIO has been updated to gracefully handle scaling down and adding new nodes to the cluster, resulting in more accurate node status notifications.
Release Note Type:
Bug Fix
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

When a new node joins the cluster the File Integrity Operator marks that new node as failed.

In the following example, ip-10-0-215-199.ap-south-1.compute.internal is the new node which was marked as failed just after it was added to the ileintegritynodestatuses object.

worker-fileintegrity-ip-10-0-192-16.ap-south-1.compute.internal    ip-10-0-192-16.ap-south-1.compute.internal    Succeeded 

worker-fileintegrity-ip-10-0-212-37.ap-south-1.compute.internal    ip-10-0-212-37.ap-south-1.compute.internal    Succeeded 

worker-fileintegrity-ip-10-0-215-199.ap-south-1.compute.internal   ip-10-0-215-199.ap-south-1.compute.internal   Failed 

worker-fileintegrity-ip-10-0-219-207.ap-south-1.compute.internal   ip-10-0-219-207.ap-south-1.compute.internal   Succeeded worker-fileintegrity-


$ oc describe cm aide-worker-fileintegrity-ip-10-0-215-199.ap-south-1.compute.internal-failed
Name:         aide-worker-fileintegrity-ip-10-0-215-199.ap-south-1.compute.internal-failed
Namespace:    openshift-file-integrity
Labels:       file-integrity.openshift.io/node=ip-10-0-215-199.ap-south-1.compute.internal
              file-integrity.openshift.io/owner=worker-fileintegrity
              file-integrity.openshift.io/result-log=
Annotations:  file-integrity.openshift.io/files-added: 0
              file-integrity.openshift.io/files-changed: 1
              file-integrity.openshift.io/files-removed: 0


Data
====
integritylog:
----
Start timestamp: 2023-03-07 14:41:04 +0000 (AIDE 0.16)
AIDE found differences between database and filesystem!!

Summary:
  Total number of entries:  35786
  Added entries:              0
  Removed entries:            0
  Changed entries:            1

---------------------------------------------------
Changed entries:
---------------------------------------------------
d   ...    n ... : /hostroot/etc/kubernetes/cni/net.d

---------------------------------------------------
Detailed information about changes:
---------------------------------------------------
Directory: /hostroot/etc/kubernetes/cni/net.d
  Linkcount: 3                                | 4
---------------------------------------------------
The attributes of the (uncompressed) database(s):
---------------------------------------------------
/hostroot/etc/kubernetes/aide.db.gz
  MD5      : 0aTQE8sSCOSHo4ddbgVY5g==
  SHA1     : K5sPGNp7Zysk7VWpoQHzxePIou0=
  RMD160   : 2CyRr7Nerz8qDKHzNv47hMSC9uc=
  TIGER    : o7VhOUH2xPXEmKVHEtG6U/blzAe/ezsU
  SHA256   : ree5Z5+mYlJDRSHUxbq4Vefrz1VBxca4
             F2sCgQyZT28=
  SHA512   : tRSTBNKK+drvLNY5ZamDgLBxdvRJej1R
             0Kh1NKW3Iemj0Ks+avlyTlKBEQi84tdD
             FsSvFeURCQdeLDAmkw+mNA=

End timestamp: 2023-03-07 14:41:33 +0000 (run time: 0m 29s)

BinaryData
====
Events:  <none

Version-Release number of selected component (if applicable):

$ oc get csv
NAME                             DISPLAY                   VERSION   REPLACES   PHASE
file-integrity-operator.v1.0.0   File Integrity Operator   1.0.0                Succeeded

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.4    True        False         6h39m   Cluster version is 4.12.4

How reproducible:

- Install file-integrity-operator on OCP cluster. 

- Create the FileIntegrity custom resource as described in the documentation[1] 

- Wait for the worker nodes to appear in 'fileintegritynodestatuses' and once all worker nodes are added to the 'fileintegritynodestatuses' object, create a new worker node through the machineset or manually. 

- Observe that after a few minutes the newly joined node will be marked as failed. 
  $ oc get fileintegritynodestatuses.fileintegrity.openshift.io 


[1] https://docs.openshift.com/container-platform/4.12/security/file_integrity_operator/file-integrity-operator-understanding.html

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

links to

KCS created for: File Integrity Operator marks newly added node as failed

openshift/file-integrity-operator#343: OCPBUGS-8502: Fix node scaling issue

Assignee:: Vincent Shen

Reporter:: Ramesh Sahoo

QA Contact:: Xiaojie Yuan

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/03/07 3:27 PM

Updated:: 2023/08/21 10:52 AM

Resolved:: 2023/08/21 2:16 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide