-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.13
Description of problem:
Customer used Agent-based installer to install 4.13.8 on they CID env, but during install process, the bootstrap machine had oom issue, check sosreport find the init container had oom issue
NOTE: Issue is not see when testing with 4.13.6, per the customer
initContainers:
- name: machine-config-controller
image: .Images.MachineConfigOperator
command: ["/usr/bin/machine-config-controller"]
args: - "bootstrap"
- "--manifest-dir=/etc/mcc/bootstrap"
- "--dest-dir=/etc/mcs/bootstrap"
- "--pull-secret=/etc/mcc/bootstrap/machineconfigcontroller-pull-secret"
- "--payload-version=.ReleaseVersion"
resources:
limits:
memory: 50Mi
we found the sosreport dmesg and crio logs had oom kill machine-config-controller container issue, the issue was cause by cgroup kill, so looks like the limit 50M is too small
The customer used a physical machine that had 100GB of memory
the customer had some network config in asstant install yaml file, maybe the issue is them had some nic config?
log files:
1. sosreport
https://attachments.access.redhat.com/hydra/rest/cases/03578865/attachments/b5501734-60be-4de4-adcf-da57e22cbb8e?usePresignedUrl=true
2. asstent installer yaml file
https://attachments.access.redhat.com/hydra/rest/cases/03578865/attachments/a32635cf-112d-49ed-828c-4501e95a0e7a?usePresignedUrl=true
3. bootstrap machine oom screenshot
https://attachments.access.redhat.com/hydra/rest/cases/03578865/attachments/eefe2e57-cd23-4abd-9e0b-dd45f20a34d2?usePresignedUrl=true
- blocks
-
OCPBUGS-17769 Agent-based install process the container machine-config-controller will be oom
- Closed
- is cloned by
-
OCPBUGS-17769 Agent-based install process the container machine-config-controller will be oom
- Closed
- links to
-
RHSA-2023:5006 OpenShift Container Platform 4.14.z security update