Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.12.0
Component/s: Monitoring, Multi-Arch / ARM
Labels:
- arm-ocp-qe

Severity:
Moderate
Regression:
None
Release Blocker:
Approved
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

testing profile: 07_aarch64_UPI on Baremetal-packet & OVN, 4.11.8-aarch64 upgrade to 4.12.0-0.nightly-arm64-2022-10-10-023446, monitoring is degraded, the cluster is with 3 masters/2 workers.

10-12 08:02:06.793  oc get nodes:
10-12 08:02:06.793   NAME                                                 STATUS   ROLES    AGE   VERSION
10-12 08:02:06.793  master-00.newugd-24256.qe.devcluster.openshift.com   Ready    master   45m   v1.24.0+dc5a2fd
10-12 08:02:06.793  master-01.newugd-24256.qe.devcluster.openshift.com   Ready    master   47m   v1.24.0+dc5a2fd
10-12 08:02:06.793  master-02.newugd-24256.qe.devcluster.openshift.com   Ready    master   47m   v1.24.0+dc5a2fd
10-12 08:02:06.793  worker-00.newugd-24256.qe.devcluster.openshift.com   Ready    worker   30m   v1.24.0+dc5a2fd
10-12 08:02:06.793  worker-01.newugd-24256.qe.devcluster.openshift.com   Ready    worker   30m   v1.24.0+dc5a2fd

  - lastTransitionTime: "2022-10-12T02:41:52Z"
    message: 'reconciling Prometheus Operator Admission Webhook Deployment failed:
      updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook:
      the number of pods targeted by the deployment (3 pods) is different from the
      number of pods targeted by the deployment that have the desired template spec
      (2 pods)'
    reason: UpdatingPrometheusOperatorFailed
    status: "True"
    type: Degraded

checked from must-gather file, there are 3 prometheus-operator-admission-webhook pods

prometheus-operator-admission-webhook-7df64f454f-bfmtl
prometheus-operator-admission-webhook-64cb6b847-4fg6m
prometheus-operator-admission-webhook-64cb6b847-45kz4

prometheus-operator-admission-webhook-7df64f454f-bfmtl is running and on node worker-01.newugd-24256.qe.devcluster.openshift.com, which the pod would be replaced later.

prometheus-operator-admission-webhook-64cb6b847-4fg6m is on worker-00.newugd-24256.qe.devcluster.openshift.com node, pod is CreateContainerError, the error is rarely seen, and it is caused by runc

  - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d37d4ba4ef834ddfc639301253c2ce593d5fe2806adb1abe52f822f3601fb31c
    imageID: ""
    lastState: {}
    name: prometheus-operator-admission-webhook
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        message: |
          container create failed: time="2022-10-12T03:20:47Z" level=error msg="runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524"
        reason: CreateContainerError
  hostIP: 147.28.151.214
  phase: Pending
  podIP: 10.131.1.43
  podIPs:
  - ip: 10.131.1.43
  qosClass: Burstable
  startTime: "2022-10-12T02:31:57Z"

prometheus-operator-admission-webhook-64cb6b847-45kz4 is Pending due to the podAntiAffinity rule which is expected

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-10-12T02:31:52Z"
    message: '0/5 nodes are available: 2 node(s) didn''t match pod anti-affinity rules,
      3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption:
      0/5 nodes are available: 2 node(s) didn''t match pod anti-affinity rules, 3
      Preemption is not helpful for scheduling.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable

Version-Release number of selected component (if applicable):

4.11.8-aarch64 upgrade to 4.12.0-0.nightly-arm64-2022-10-10-023446

How reproducible:

not always

Steps to Reproduce:

1. 4.11.8-aarch64 upgrade to 4.12.0-0.nightly-arm64-2022-10-10-023446
2.
3.

Actual results:

4.11.8-aarch64 upgrade to 4.12.0-0.nightly-arm64-2022-10-10-023446, monitoring is degraded

Expected results:

no error for upgrade

Additional info:

must-gather file:
https://drive.google.com/file/d/1yS6s74M3t2zOKpssiBTAJ7yFzV4cSic6/view?usp=sharing

duplicates

OCPBUGS-2637 [ARM64][4.11.0+] Containers are stuck in CreateError with 'error loading seccomp filter: errno 524'

Closed

is blocked by

RUN-1668 Impact: 4.11 upgrade to 4.12, prometheus-operator-admission-webhook pod is failed to start up due to "error loading seccomp filter into kernel: error loading seccomp filter: errno 524"

Closed

relates to

OCPBUGS-708 UpdatingKubeStateMetricsFailed before Upgrade

Closed

OCPBUGS-1882 runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524

Closed

Assignee:: Simon Pasquier

Reporter:: Junqi Zhao

QA Contact:: Junqi Zhao

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2022/10/13 7:16 AM

Updated:: 2022/11/16 12:43 PM

Resolved:: 2022/11/16 11:36 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates