Uploaded image for project: 'Multiple Architecture Enablement'
  1. Multiple Architecture Enablement
  2. MULTIARCH-5655

enoexec-event-daemon fails to update ENOExecEvent status and rollback due to missing RBAC permission

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      2. Sometimes the enoexec-event-daemon pod fails to update the ENOExecEvent status. When it attempts to roll back by deleting the ENOExecEvent, the rollback also fails due to missing RBAC permission (forbidden error). This results in leftover ENOExecEvent CRs with a null spec in the operator namespace, which can block the deletion of the ClusterPodPlacementConfig.

       

      evel":"info","ts":1758094208.0363424,"caller":"storage/k8s_enoexecevent.go:158","msg":"Successfully created ENOExecEvent in Kubernetes","event":"eacc530e-3584-4c9a-808e-33100ba746a6","pod_name":"test-deployment-5446cb78-2d6kl","pod_namespace":"lwan-test","container_id":"cri-o://conmon-c4d53534e0475b4067f750194be59fee725c3fc7af1beb02a92fbb69beaa6cad"}
      {"level":"error","ts":1758094208.0483973,"caller":"storage/k8s_enoexecevent.go:152","msg":"Failed to rollback ENOExecEvent creation","event":"eacc530e-3584-4c9a-808e-33100ba746a6","error":"enoexecevents.multiarch.openshift.io \"eacc530e-3584-4c9a-808e-33100ba746a6\" is forbidden: User \"system:serviceaccount:openshift-multiarch-tuning-operator:enoexec-event-daemon\" cannot delete resource \"enoexecevents\" in API group \"multiarch.openshift.io\" in the namespace \"openshift-multiarch-tuning-operator\"","stacktrace":"github.com/openshift/multiarch-tuning-operator/controllers/enoexecevent/daemon/internal/storage.(*K8sENOExecEventStorage).processEvent.func1\n\t/workspace/controllers/enoexecevent/daemon/internal/storage/k8s_enoexecevent.go:152\ngithub.com/openshift/multiarch-tuning-operator/controllers/enoexecevent/daemon/internal/storage.(*K8sENOExecEventStorage).processEvent\n\t/workspace/controllers/enoexecevent/daemon/internal/storage/k8s_enoexecevent.go:168\ngithub.com/openshift/multiarch-tuning-operator/controllers/enoexecevent/daemon/internal/storage.(*K8sENOExecEventStorage).Run\n\t/workspace/controllers/enoexecevent/daemon/internal/storage/k8s_enoexecevent.go:95\ngithub.com/openshift/multiarch-tuning-operator/controllers/enoexecevent/daemon.runWorker\n\t/workspace/controllers/enoexecevent/daemon/daemon.go:77"}
      {"level":"error","ts":1758094208.0484605,"caller":"storage/k8s_enoexecevent.go:96","msg":"Failed to process ENOExec event","event":{"PodName":"test-deployment-5446cb78-2d6kl","PodNamespace":"lwan-test","ContainerID":"cri-o://conmon-c4d53534e0475b4067f750194be59fee725c3fc7af1beb02a92fbb69beaa6cad"},"error":"failed to update ENOExecEvent status in Kubernetes: ENoExecEvent.multiarch.openshift.io \"eacc530e-3584-4c9a-808e-33100ba746a6\" is invalid: status.containerID: Invalid value: \"cri-o://conmon-c4d53534e0475b4067f750194be59fee725c3fc7af1beb02a92fbb69beaa6cad\": containerID in body should match '^.+://[a-f0-9]{64}$'","stacktrace":"github.com/openshift/multiarch-tuning-operator/controllers/enoexecevent/daemon/internal/storage.(*K8sENOExecEventStorage).Run\n\t/workspace/controllers/enoexecevent/daemon/internal/storage/k8s_enoexecevent.go:96\ngithub.com/openshift/multiarch-tuning-operator/controllers/enoexecevent/daemon.runWorker\n\t/workspace/controllers/enoexecevent/daemon/daemon.go:77"} 

       

       

      Step to Reproduce:

      1. install the operator and enable execFormatErrorMonitor plugin in CPPC
      2. Create a deployment with nodeSelector: amd64 but use an ARM-only image to trigger an exec format error. Setting the replicas to 10 makes it easier to reproduce the issue.

      oc create -f - <<EOF
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: test-deployment
      spec:
        selector:
          matchLabels:
            app: hello-openshift
        replicas: 10
        template:
          metadata:
            labels:
              app: hello-openshift
          spec:
            nodeSelector:
              kubernetes.io/arch: amd64
            containers:
            - name: hello-openshift
              image: quay.io/openshifttest/hello-openshift:arm-1.2.0
      EOF

      3. check the enoexecevent in operator namespace, run

      $oc get enoexecevent -n openshift-multiarch-tuning-operator

       

      The actual result:
       there are some leftover enoexecevent CRs with null spec

      NAME                                   NODENAME   PODNAME   PODNAMESPACE   CONTAINERID

      0f16e049-8815-4734-a07d-29eeccdef887                                       

      21c2d783-f17a-499b-8647-838c2dc63028                                       

      a3bf1531-96fb-4857-ac38-fb6fc1ee6b6f                                       

      b706042b-06f8-4012-ad63-f6fd59e97563                                       

      bec56c28-c46d-47f7-baed-095274e92664                                       

      d53cc2e4-ac20-48b4-9835-915aab423aec                                       

      db81759b-e344-41a2-80a2-e50177fa5e81                                       

      eacc530e-3584-4c9a-808e-33100ba746a6                                       

      ff1f7319-d64b-495f-9e20-57d6f4c650bf

      The expect result:

      It should have leftover enoexecevent CR

              lwan-wanglin Lin Wang
              lwan-wanglin Lin Wang
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: