Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2302

4.11 upgrade to 4.12, prometheus-operator-admission-webhook pod is failed to start up due to "error loading seccomp filter into kernel: error loading seccomp filter: errno 524"

    XMLWordPrintable

Details

    • Moderate
    • Approved
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      testing profile: 07_aarch64_UPI on Baremetal-packet & OVN, 4.11.8-aarch64 upgrade to 4.12.0-0.nightly-arm64-2022-10-10-023446, monitoring is degraded, the cluster is with 3 masters/2 workers.

      10-12 08:02:06.793  oc get nodes:
      10-12 08:02:06.793   NAME                                                 STATUS   ROLES    AGE   VERSION
      10-12 08:02:06.793  master-00.newugd-24256.qe.devcluster.openshift.com   Ready    master   45m   v1.24.0+dc5a2fd
      10-12 08:02:06.793  master-01.newugd-24256.qe.devcluster.openshift.com   Ready    master   47m   v1.24.0+dc5a2fd
      10-12 08:02:06.793  master-02.newugd-24256.qe.devcluster.openshift.com   Ready    master   47m   v1.24.0+dc5a2fd
      10-12 08:02:06.793  worker-00.newugd-24256.qe.devcluster.openshift.com   Ready    worker   30m   v1.24.0+dc5a2fd
      10-12 08:02:06.793  worker-01.newugd-24256.qe.devcluster.openshift.com   Ready    worker   30m   v1.24.0+dc5a2fd
        - lastTransitionTime: "2022-10-12T02:41:52Z"
          message: 'reconciling Prometheus Operator Admission Webhook Deployment failed:
            updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook:
            the number of pods targeted by the deployment (3 pods) is different from the
            number of pods targeted by the deployment that have the desired template spec
            (2 pods)'
          reason: UpdatingPrometheusOperatorFailed
          status: "True"
          type: Degraded

      checked from must-gather file, there are 3 prometheus-operator-admission-webhook pods

      prometheus-operator-admission-webhook-7df64f454f-bfmtl
      prometheus-operator-admission-webhook-64cb6b847-4fg6m
      prometheus-operator-admission-webhook-64cb6b847-45kz4 

      prometheus-operator-admission-webhook-7df64f454f-bfmtl is running and on node worker-01.newugd-24256.qe.devcluster.openshift.com, which the pod would be replaced later.

      prometheus-operator-admission-webhook-64cb6b847-4fg6m is on worker-00.newugd-24256.qe.devcluster.openshift.com node, pod is CreateContainerError, the error is rarely seen, and it is caused by runc

        - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d37d4ba4ef834ddfc639301253c2ce593d5fe2806adb1abe52f822f3601fb31c
          imageID: ""
          lastState: {}
          name: prometheus-operator-admission-webhook
          ready: false
          restartCount: 0
          started: false
          state:
            waiting:
              message: |
                container create failed: time="2022-10-12T03:20:47Z" level=error msg="runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524"
              reason: CreateContainerError
        hostIP: 147.28.151.214
        phase: Pending
        podIP: 10.131.1.43
        podIPs:
        - ip: 10.131.1.43
        qosClass: Burstable
        startTime: "2022-10-12T02:31:57Z" 

      prometheus-operator-admission-webhook-64cb6b847-45kz4 is Pending due to the podAntiAffinity rule which is expected

      status:
        conditions:
        - lastProbeTime: null
          lastTransitionTime: "2022-10-12T02:31:52Z"
          message: '0/5 nodes are available: 2 node(s) didn''t match pod anti-affinity rules,
            3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption:
            0/5 nodes are available: 2 node(s) didn''t match pod anti-affinity rules, 3
            Preemption is not helpful for scheduling.'
          reason: Unschedulable
          status: "False"
          type: PodScheduled
        phase: Pending
        qosClass: Burstable 

      Version-Release number of selected component (if applicable):

      4.11.8-aarch64 upgrade to 4.12.0-0.nightly-arm64-2022-10-10-023446

      How reproducible:

      not always

      Steps to Reproduce:

      1. 4.11.8-aarch64 upgrade to 4.12.0-0.nightly-arm64-2022-10-10-023446
      2.
      3.
      

      Actual results:

      4.11.8-aarch64 upgrade to 4.12.0-0.nightly-arm64-2022-10-10-023446, monitoring is degraded

      Expected results:

      no error for upgrade

      Additional info:

      must-gather file:
      https://drive.google.com/file/d/1yS6s74M3t2zOKpssiBTAJ7yFzV4cSic6/view?usp=sharing

      Attachments

        Issue Links

          Activity

            People

              spasquie@redhat.com Simon Pasquier
              juzhao@redhat.com Junqi Zhao
              Junqi Zhao Junqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: