Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-35787

[4.15] In some flow, cronjob / job are not cleaned-up

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • CNV v4.15.1
    • CNV v4.15.0
    • Storage Platform
    • None
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • ---
    • ---
    • Storage Core Sprint 249, Storage Core Sprint 250, Storage Core Sprint 251
    • No

      Description of problem:

      After tier1 CDI OCS test suite, we see that the cronjob and it's job are not cleaned up.
      The pods are being recreated every *something seconds
      
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get pods -n openshift-cnv | grep cron
      cron-test-a6f5c71b-28351619-2prll                      0/1     Error     0              5s
      cron-test-a6f5c71b-28351619-8wkgx                      0/1     Error     0              37s
      cron-test-a6f5c71b-28351619-bkld5                      0/1     Error     0              26s
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get pods -n openshift-cnv | grep cron
      cron-test-a6f5c71b-28351620-wzdgk                      0/1     Error     0              3s
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get pods -n openshift-cnv | grep cron
      cron-test-a6f5c71b-28351620-j42l2                      0/1     Error     0              19s
      cron-test-a6f5c71b-28351620-wzdgk                      0/1     Error     0              30s
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get pods -n openshift-cnv | grep cron
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get pods -n openshift-cnv | grep cron
      cron-test-a6f5c71b-28351623-h86lq                      0/1     ContainerCreating   0              0s
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get pods -n openshift-cnv | grep cron
      cron-test-a6f5c71b-28351623-h86lq                      0/1     Error     0              2s
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get pods -n openshift-cnv | grep cron
      cron-test-a6f5c71b-28351623-5gtlp                      0/1     Error               0              21s
      cron-test-a6f5c71b-28351623-h86lq                      0/1     Error               0              32s
      cron-test-a6f5c71b-28351623-svbfj                      0/1     ContainerCreating   0              0s
      
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get dataimportcron --all-namespaces
      NAMESPACE                            NAME                        FORMAT
      openshift-virtualization-os-images   centos-7-image-cron         pvc
      openshift-virtualization-os-images   centos-stream8-image-cron   pvc
      openshift-virtualization-os-images   centos-stream9-image-cron   pvc
      openshift-virtualization-os-images   fedora-image-cron           pvc
      openshift-virtualization-os-images   rhel8-image-cron            pvc
      openshift-virtualization-os-images   rhel9-image-cron            pvc
      
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get cronjobs -n openshift-cnv
      NAME                                 SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
      centos-7-image-cron-1e248498         7 4/12 * * *   False     0        11h             3d
      centos-stream8-image-cron-9290d33a   7 4/12 * * *   False     0        11h             3d
      centos-stream9-image-cron-188404e1   7 4/12 * * *   False     0        11h             3d
      cron-test-a6f5c71b                   * * * * *      False     0        36s             47h
      fedora-image-cron-9a1f2246           7 4/12 * * *   False     0        11h             3d
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get jobs -n openshift-cnv
      NAME                          COMPLETIONS   DURATION   AGE
      cron-test-a6f5c71b-28351630   0/1           43s        43s
      
      [cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get cronjob -n openshift-cnv cron-test-a6f5c71b -oyaml
      apiVersion: batch/v1
      kind: CronJob
      metadata:
        creationTimestamp: "2023-11-25T15:43:51Z"
        generation: 3
        labels:
          app.kubernetes.io/component: storage
          app.kubernetes.io/managed-by: cdi-controller
          app.kubernetes.io/part-of: hyperconverged-cluster
          app.kubernetes.io/version: 4.14.1
          cdi.kubevirt.io/dataImportCron: cdi-e2e-tests-dataimportcron-func-test-ch5s8.cron-test
        name: cron-test-a6f5c71b
        namespace: openshift-cnv
        ownerReferences:
        - apiVersion: cdi.kubevirt.io/v1beta1
          blockOwnerDeletion: true
          controller: true
          kind: CDI
          name: cdi-kubevirt-hyperconverged
          uid: 974943ef-6c2d-498b-96cc-a3dc9a7fdd74
        resourceVersion: "8733559"
        uid: db47a2f5-843d-445d-b374-592bdc589813
      spec:
        concurrencyPolicy: Forbid
        failedJobsHistoryLimit: 1
        jobTemplate:
          metadata:
            creationTimestamp: null
          spec:
            backoffLimit: 2
            template:
              metadata:
                creationTimestamp: null
              spec:
                containers:
                - command:
                  - /usr/bin/cdi-source-update-poller
                  - -ns
                  - cdi-e2e-tests-dataimportcron-func-test-ch5s8
                  - -cron
                  - cron-test
                  - -url
                  - docker://cdi-docker-registry-host.openshift-cnv/tinycoreqcow2
                  env:
                  - name: INSECURE_TLS
                    value: "true"
                  - name: http_proxy
                  - name: https_proxy
                  - name: no_proxy
                  image: registry.redhat.io/container-native-virtualization/virt-cdi-importer-rhel9@sha256:75a9f754acba4cc158ebac58b161b70f964802a4ce9915cb20db413854af2830
                  imagePullPolicy: IfNotPresent
                  name: cdi-source-update-poller
                  resources: {}
                  securityContext:
                    allowPrivilegeEscalation: false
                    capabilities:
                      drop:
                      - ALL
                    runAsNonRoot: true
                    runAsUser: 107
                    seccompProfile:
                      type: RuntimeDefault
                  terminationMessagePath: /dev/termination-log
                  terminationMessagePolicy: File
                dnsPolicy: ClusterFirst
                restartPolicy: Never
                schedulerName: default-scheduler
                securityContext: {}
                serviceAccount: cdi-cronjob
                serviceAccountName: cdi-cronjob
                terminationGracePeriodSeconds: 0
            ttlSecondsAfterFinished: 10
        schedule: '* * * * *'
        successfulJobsHistoryLimit: 1
        suspend: false
      status:
        active:
        - apiVersion: batch/v1
          kind: Job
          name: cron-test-a6f5c71b-28351632
          namespace: openshift-cnv
          resourceVersion: "8733558"
          uid: 961f3160-d661-4e53-b3d6-1d5059b66d63
        lastScheduleTime: "2023-11-27T15:12:00Z"
        lastSuccessfulTime: "2023-11-25T15:45:04Z"
      
      

      Version-Release number of selected component (if applicable):

      Seen on 4.14, 4.15
      

      How reproducible:

      Seen twice, we don't usually reuse clusters from tier1 runs, but when we do - we hit this issue and it blocks tier2/3 runs.
      

      Steps to Reproduce:

      1. Run tier1 CDI OCS test suite (we don't know which test is causing it)
      

      Actual results:

      cronjob and job are not cleaned up
      

      Expected results:

      cronjob and job are cleaned up
      

      Additional info:

      It's a potential test blocker, because all the rest test jobs are being aborted when we see broken pods in openshift-cnv namespace
      

      Git Pull Request: https://github.com/kubevirt/containerized-data-importer/pull/3106 closed

      Git Pull Request: https://github.com/kubevirt/containerized-data-importer/pull/3120 closed

              agilboa@redhat.com Arnon Gilboa
              jpeimer@redhat.com Jenia Peimer
              Jenia Peimer Jenia Peimer
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: