Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17417

Scheduling privileged pods on power is failing in 4.13+ CI jobs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • None
    • 4.13, 4.14
    • None
    • No
    • Multi-Arch Sprint 240, Multi-Arch Sprint 241
    • 2
    • Approved
    • ppc64le
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Observing pod failures in 4.14 libvirt and PowerVS runs
      
      Aug  2 09:23:21.958: INFO: Error evaluating pod condition running: final error: pod failed permanently
      Aug  2 09:23:21.958: INFO: Unexpected error: while waiting for pod to be running: 
          <*fmt.wrapError | 0xc007006b40>: {
              msg: "error while waiting for pod e2e-provisioning-5252/pod-subpath-test-preprovisionedpv-s22h to be running: final error: pod failed permanently",
              err: <*pod.FinalErr | 0xc0004123e0>{
                  Err: <*errors.errorString | 0xc0004123c0>{
                      s: "pod failed permanently",
                  },
              },
          }
      Aug  2 09:23:21.958: FAIL: while waiting for pod to be running: error while waiting for pod e2e-provisioning-5252/pod-subpath-test-preprovisionedpv-s22h to be running: final error: pod failed permanently
      
      
      PowerVS busybox errors:
      
      {  event [ns/e2e-privileged-pod-4717 pod/privileged-pod node/rdr-multiarch-XXX01-qt7hv-worker-69jnd hmsg/1ec3373237 - pathological/true interesting/true reason/BackOff Back-off restarting failed container privileged-container in pod privileged-pod_e2e-privileged-pod-4717(b6da79bd-ed72-450b-84a4-5fda1bba8ba7)] happened 24 times
      event [ns/e2e-mount-propagation-9836 pod/master node/rdr-multiarch-XXX01-qt7hv-worker-69jnd hmsg/5f0d6137b7 - pathological/true interesting/true reason/BackOff Back-off restarting failed container cntr in pod master_e2e-mount-propagation-9836(523b8239-4e22-47a5-ab19-3163f8eebf91)] happened 24 times
      
      
      Busybox error log:
      
      fail [k8s.io/kubernetes@v1.27.1/test/e2e/framework/pod/pod_client.go:106]: Timed out after 300.000s.
      expected pod to be running and ready, got instead:
          <*v1.Pod | 0xc000570000>: 
              metadata:
                annotations:
                  k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.128.2.26/23"],"mac_address":"0a:58:0a:80:02:1a","gateway_ips":["10.128.2.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.128.2.1"},{"dest":"172.30.0.0/16","nextHop":"10.128.2.1"},{"dest":"100.64.0.0/16","nextHop":"10.128.2.1"}],"ip_address":"10.128.2.26/23","gateway_ip":"10.128.2.1"}}'
                  k8s.v1.cni.cncf.io/network-status: |-
                    [{
                        "name": "ovn-kubernetes",
                        "interface": "eth0",
                        "ips": [
                            "10.128.2.26"
                        ],
                        "mac": "0a:58:0a:80:02:1a",
                        "default": true,
                        "dns": {}
                    }]
                creationTimestamp: "2023-08-02T13:36:47Z"
                managedFields:
                - apiVersion: v1
                  fieldsType: FieldsV1
                  fieldsV1:
                    f:spec:
                      f:containers:
                        k:{"name":"not-privileged-container"}:
                          .: {}
                          f:command: {}
                          f:image: {}
                          f:imagePullPolicy: {}
                          f:name: {}
                          f:resources: {}
                          f:securityContext:
                            .: {}
                            f:privileged: {}
                          f:terminationMessagePath: {}
                          f:terminationMessagePolicy: {}
                        k:{"name":"privileged-container"}:
                          .: {}
                          f:command: {}
                          f:image: {}
                          f:imagePullPolicy: {}
                          f:name: {}
                          f:resources: {}
                          f:securityContext:
                            .: {}
                            f:privileged: {}
                          f:terminationMessagePath: {}
                          f:terminationMessagePolicy: {}
                      f:dnsPolicy: {}
                      f:enableServiceLinks: {}
                      f:restartPolicy: {}
                      f:schedulerName: {}
                      f:securityContext: {}
                      f:terminationGracePeriodSeconds: {}
                  manager: openshift-tests
                  operation: Update
                  time: "2023-08-02T13:36:47Z"
                - apiVersion: v1
                  fieldsType: FieldsV1
                  fieldsV1:
                    f:metadata:
                      f:annotations:
                        .: {}
                        f:k8s.ovn.org/pod-networks: {}
                  manager: rdr-multiarch-XXX01-qt7hv-master-0
                  operation: Update
                  time: "2023-08-02T13:36:47Z"
                - apiVersion: v1
                  fieldsType: FieldsV1
                  fieldsV1:
                    f:metadata:
                      f:annotations:
                        f:k8s.v1.cni.cncf.io/network-status: {}
                  manager: multus
                  operation: Update
                  subresource: status
                  time: "2023-08-02T13:36:48Z"
                - apiVersion: v1
                  fieldsType: FieldsV1
                  fieldsV1:
                    f:status:
                      f:conditions:
                        k:{"type":"ContainersReady"}:
                          .: {}
                          f:lastProbeTime: {}
                          f:lastTransitionTime: {}
                          f:message: {}
                          f:reason: {}
                          f:status: {}
                          f:type: {}
                        k:{"type":"Initialized"}:
                          .: {}
                          f:lastProbeTime: {}
                          f:lastTransitionTime: {}
                          f:status: {}
                          f:type: {}
                        k:{"type":"Ready"}:
                          .: {}
                          f:lastProbeTime: {}
                          f:lastTransitionTime: {}
                          f:message: {}
                          f:reason: {}
                          f:status: {}
                          f:type: {}
                      f:containerStatuses: {}
                      f:hostIP: {}
                      f:phase: {}
                      f:podIP: {}
                      f:podIPs:
                        .: {}
                        k:{"ip":"10.128.2.26"}:
                          .: {}
                          f:ip: {}
                      f:startTime: {}
                  manager: kubelet
                  operation: Update
                  subresource: status
                  time: "2023-08-02T13:40:10Z"
                name: privileged-pod
                namespace: e2e-privileged-pod-4717
                resourceVersion: "58750"
                uid: b6da79bd-ed72-450b-84a4-5fda1bba8ba7
              spec:
                containers:
                - command:
                  - /bin/sleep
                  - "10000"
                  image: quay.io/openshift/community-e2e-images:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ
                  imagePullPolicy: IfNotPresent
                  name: privileged-container
                  resources: {}
                  securityContext:
                    privileged: true
                  terminationMessagePath: /dev/termination-log
                  terminationMessagePolicy: File
                  volumeMounts:
                  - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
                    name: kube-api-access-z6xzq
                    readOnly: true
                - command:
                  - /bin/sleep
                  - "10000"
                  image: quay.io/openshift/community-e2e-images:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ
                  imagePullPolicy: IfNotPresent
                  name: not-privileged-container
                  resources: {}
                  securityContext:
                    privileged: false
                  terminationMessagePath: /dev/termination-log
                  terminationMessagePolicy: File
                  volumeMounts:
                  - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
                    name: kube-api-access-z6xzq
                    readOnly: true
                dnsPolicy: ClusterFirst
                enableServiceLinks: true
                imagePullSecrets:
                - name: default-dockercfg-pwqm2
                nodeName: rdr-multiarch-XXX01-qt7hv-worker-69jnd
                preemptionPolicy: PreemptLowerPriority
                priority: 0
                restartPolicy: Always
                schedulerName: default-scheduler
                securityContext: {}
                serviceAccount: default
                serviceAccountName: default
                terminationGracePeriodSeconds: 30
                tolerations:
                - effect: NoExecute
                  key: node.kubernetes.io/not-ready
                  operator: Exists
                  tolerationSeconds: 300
                - effect: NoExecute
                  key: node.kubernetes.io/unreachable
                  operator: Exists
                  tolerationSeconds: 300
                volumes:
                - name: kube-api-access-z6xzq
                  projected:
                    defaultMode: 420
                    sources:
                    - serviceAccountToken:
                        expirationSeconds: 3607
                        path: token
                    - configMap:
                        items:
                        - key: ca.crt
                          path: ca.crt
                        name: kube-root-ca.crt
                    - downwardAPI:
                        items:
                        - fieldRef:
                            apiVersion: v1
                            fieldPath: metadata.namespace
                          path: namespace
                    - configMap:
                        items:
                        - key: service-ca.crt
                          path: service-ca.crt
                        name: openshift-service-ca.crt
              status:
                conditions:
                - lastProbeTime: null
                  lastTransitionTime: "2023-08-02T13:36:47Z"
                  status: "True"
                  type: Initialized
                - lastProbeTime: null
                  lastTransitionTime: "2023-08-02T13:36:47Z"
                  message: 'containers with unready status: [privileged-container]'
                  reason: ContainersNotReady
                  status: "False"
                  type: Ready
                - lastProbeTime: null
                  lastTransitionTime: "2023-08-02T13:36:47Z"
                  message: 'containers with unready status: [privileged-container]'
                  reason: ContainersNotReady
                  status: "False"
                  type: ContainersReady
                - lastProbeTime: null
                  lastTransitionTime: "2023-08-02T13:36:47Z"
                  status: "True"
                  type: PodScheduled
                containerStatuses:
                - containerID: cri-o://f92b802c4f58b20d0308a61cd068a3489403a7e83f89f896fc90f8f2a031b1f6
                  image: quay.io/openshift/community-e2e-images:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ
                  imageID: quay.io/openshift/community-e2e-images@sha256:2e0f836850e09b8b7cc937681d6194537a09fbd5f6b9e08f4d646a85128e8937
                  lastState: {}
                  name: not-privileged-container
                  ready: true
                  restartCount: 0
                  started: true
                  state:
                    running:
                      startedAt: "2023-08-02T13:36:50Z"
                - containerID: cri-o://bc984256d1ddb4a8e5fca7a07d0b6d7aa81561cc6fe1b3558c253378140d4d98
                  image: quay.io/openshift/community-e2e-images:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ
                  imageID: quay.io/openshift/community-e2e-images@sha256:2e0f836850e09b8b7cc937681d6194537a09fbd5f6b9e08f4d646a85128e8937
                  lastState:
                    terminated:
                      containerID: cri-o://bc984256d1ddb4a8e5fca7a07d0b6d7aa81561cc6fe1b3558c253378140d4d98
                      exitCode: 127
                      finishedAt: "2023-08-02T13:39:56Z"
                      reason: Error
                      startedAt: "2023-08-02T13:39:55Z"
                  name: privileged-container
                  ready: false
                  restartCount: 5
                  started: false
                  state:
                    waiting:
                      message: back-off 2m40s restarting failed container=privileged-container pod=privileged-pod_e2e-privileged-pod-4717(b6da79bd-ed72-450b-84a4-5fda1bba8ba7)
                      reason: CrashLoopBackOff
                hostIP: 192.168.84.15
                phase: Running
                podIP: 10.128.2.26
                podIPs:
                - ip: 10.128.2.26
                qosClass: BestEffort
                startTime: "2023-08-02T13:36:47Z"

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

            shgokul Shilpa Gokul
            shgokul Shilpa Gokul
            Doug Slavens Doug Slavens
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: