-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
4.12, 4.11
-
Critical
-
None
-
Rejected
-
False
-
[adistefa: Updated description with the ongoing investigation outcomes. The old description is being kept below]
Version-Release number of selected component (if applicable):
4.11.z-aarch64, for any z
Scenario/How reproducible
The issue is always reproducible in the following scenario: - 3 masters m6g.xlarge - 2 workers - 1 tainted worker with either m6g.xlarge, m6g.2xlarge, or m6g.4xlarge as instanceType. - Using the payload that I'm attaching herewith, consisting of a namespace, an ImageStream, and a deployment with a pod made of 10 containers that sleep.
Steps to reproduce
1. Set the nodeSelector for the requiredAffinity (and tolerations, if taints are used) to make the pods land in a single worker. 2. oc apply -f deployment.yaml 3. oc project my-project 4. oc scale deployment/my-deployment --replicas=45 # or more Change the replicas parameter so that the tainted worker gets up to 472 containers regardless of the chosen instance type (sometimes I got more containers, but still around that number, +- 10). You can look at the total number of containers with: oc debug node/my-worker chroot /host watch 'echo $(( $(crictl ps | wc -l) - 1 )) - $(find /var/run/crio -type l ! -readable | wc -l)' You will see the number of containers (left) and the number of broken links (right). The number of broken links will start to increase linearly when we reach a number of total containers in a node that is greater than 472 (+- 10 in my tests). This is considered more a symptom of the issue.
oc debug node/my-worker
chroot /host
watch 'echo $(( $(crictl ps | wc -l) - 1 ))'
The node's journal and the events for the failed-to-create containers' pods report:
Error: container create failed:
time="2022-11-11T20:30:20Z" level=error msg="runc create failed: unable
to start container process: unable to init seccomp: error loading
seccomp filter into kernel: error loading seccomp filter: errno 524"
[[ OLD DESCRIPTION ]]
Description of problem:
-When update a 4.11 arm64(05_aarch64_IPI on AWS & Private cluster & FIPS on & OVN & Etcd Encryption) cluster to 4.12, image-registry pods failed to start with error "runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524" which blocked the upgrade process-
-10-19 22:16:00.802 Message: Available: The registry has minimum availability 10-19 22:16:00.802 NodeCADaemonAvailable: The daemon set node-ca has available replicas 10-19 22:16:00.802 ImagePrunerAvailable: Pruner CronJob has been created 10-19 22:16:00.802 Reason: MinimumAvailability 10-19 22:16:00.802 Status: True 10-19 22:16:00.802 Type: Available 10-19 22:16:00.802 Last Transition Time: 2022-10-19T13:35:48Z 10-19 22:16:00.802 Message: Progressing: The deployment has not completed 10-19 22:16:00.802 NodeCADaemonProgressing: The daemon set node-ca is deployed 10-19 22:16:00.802 Reason: DeploymentNotCompleted 10-19 22:16:00.802 Status: True 10-19 22:16:00.802 Type: Progressing 10-19 22:16:00.802 Last Transition Time: 2022-10-19T13:37:48Z 10-19 22:16:00.802 Message: Degraded: Registry deployment has timed out progressing: ReplicaSet "image-registry-658cd9b654" has timed out progressing. 10-19 22:16:00.802 Reason: ProgressDeadlineExceeded 10-19 22:16:00.802 Status: True 10-19 22:16:00.802 Type: Degraded 10-19 22:16:00.802 Extension: <nil> - - 10-19 22:16:03.025 38m Warning Failed pod/image-registry-658cd9b654-fcnrh Error: container create failed: time="2022-10-19T13:37:41Z" level=error msg="runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524" 10-19 22:16:03.025 15m Warning Failed pod/image-registry-658cd9b654-fcnrh (combined from similar events): Error: container create failed: time="2022-10-19T14:00:50Z" level=error msg="runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524"
Version-Release number of selected component (if applicable):
4.11.0-0.nightly-arm64-2022-10-19-063757 to 4.12.0-0.nightly-arm64-2022-10-18-153953
How reproducible:
not always
Steps to Reproduce:
1. upgrade 4.11.0-0.nightly-arm64-2022-10-19-063757 cluster to 4.12.0-0.nightly-arm64-2022-10-18-153953 2. 3.
Actual results:
Image registry pods failed to start on 4.12
Expected results:
Image registry should upgrade successfully
Additional info:
must-gather log https://drive.google.com/file/d/1SAC82YC-g7s8OiqnBMptf4DVyp6YsEKw/view?usp=sharing
-
- duplicates
-
OCPBUGS-6981 error 524 from seccomp(2) when trying to load filter [rhel-8.6.0.z]
- Closed
- is duplicated by
-
OCPBUGS-708 UpdatingKubeStateMetricsFailed before Upgrade
- Closed
-
OCPBUGS-2302 4.11 upgrade to 4.12, prometheus-operator-admission-webhook pod is failed to start up due to "error loading seccomp filter into kernel: error loading seccomp filter: errno 524"
- Closed
-
OCPBUGS-1882 runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524
- Closed
- relates to
-
RUN-1668 Impact: 4.11 upgrade to 4.12, prometheus-operator-admission-webhook pod is failed to start up due to "error loading seccomp filter into kernel: error loading seccomp filter: errno 524"
- Closed
- links to