Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-32696

[2237705] Running VM shuts down with sigterm error

XMLWordPrintable

    • High

      Description of problem:
      A running vm shuts down after running for several minutes with a reported SIGTERM being sent to all processes

      Version-Release number of selected component (if applicable):
      oc get csv -n openshift-cnv
      NAME DISPLAY VERSION REPLACES PHASE
      kubevirt-hyperconverged-operator.4.14.0-1876 OpenShift Virtualization 4.14.0-1876 kubevirt-hyperconverged-operator.4.14.0-1867 Succeeded
      odr-cluster-operator.v4.14.0-123.stable Openshift DR Cluster Operator 4.14.0-123.stable odr-cluster-operator.v4.14.0-117.stable Succeeded
      openshift-pipelines-operator-rh.v1.11.1 Red Hat OpenShift Pipelines 1.11.1 Succeeded
      volsync-product.v0.7.4 VolSync 0.7.4 volsync-product.v0.7.3 Succeeded

      Client Version: 4.14.0-ec.3
      Kustomize Version: v5.0.1
      Server Version: 4.14.0-0.nightly-2023-08-11-055332
      Kubernetes Version: v1.27.4+deb2c60

      How reproducible:
      100%

      Steps to Reproduce:
      1. Deployed vm to openshift virtualization cluster from RHACM hub - vm is successfully deployed
      2. Start the vm with 'virtctl start vm' - vm is running
      3. Access the vm console - 'virtctl console vm', login and write data files
      4. After about 10 minutes the VM shuts down with the message below:
      5. Restart and access the vm, same happens. Reproduced this multiple times

      The system is going down NOW!
      Sent SIGTERM to all processes
      Sent SIGKILL to all processes
      Requesting system poweroff
      [ 687.879014] sd 1:0:0:0: [sda] Synchronizing SCSI cache
      [ 687.880156] sd 1:0:0:0: [sda] Stopping disk
      [ 687.973945] reboot: Power down

      You were disconnected from the console. This has one of the following reasons:

      • another user connected to the console of the target vm
      • network issues
        websocket: close 1006 (abnormal closure): unexpected EOF

      Actual results:
      VM shuts down unexpectedly with SIGTERM sent to all processes

      Expected results:
      VM should remain up and running

      Additional info:

      oc get pvc -n kevin-dr
      NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
      sample-vm-pvc Bound pvc-c8112912-8ac9-4537-adaf-c9fd6089dee7 2Gi RWX ocs-external-storagecluster-ceph-rbd 42h
      tmp-pvc Bound pvc-b08f240f-e828-49bb-9cf4-44ed8e8d9174 954Mi RWO ocs-external-storagecluster-ceph-rbd 7d22h

      oc get vm -n kevin-dr
      NAME AGE STATUS READY
      sample-vm 42h Stopped False
      [kgoldbla@localhost Metro_DR]$ virtctl start sample-vm -n kevin-dr
      VM sample-vm was scheduled to start
      [kgoldbla@localhost Metro_DR]$ virtctl console sample-vm -n kevin-dr
      Successfully connected to sample-vm console. The escape sequence is ^]

      login as 'cirros' user. default password: 'gocubsgo'. use 'sudo' for root.
      sample-vm login: cirros
      Password:
      $ lsblk
      NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
      sda 8:0 0 1M 0 disk
      vda 252:0 0 2G 0 disk

      -vda1 252:1 0 2G 0 part /
      `-vda15 252:15 0 8M 0 part

      The system is going down NOW!
      Sent SIGTERM to all processes
      Sent SIGKILL to all processes
      Requesting system poweroff
      [ 687.879014] sd 1:0:0:0: [sda] Synchronizing SCSI cache
      [ 687.880156] sd 1:0:0:0: [sda] Stopping disk
      [ 687.973945] reboot: Power down

      You were disconnected from the console. This has one of the following reasons:

      • another user connected to the console of the target vm
      • network issues
        websocket: close 1006 (abnormal closure): unexpected EOF

      oc get vm sample-vm -n kevin-dr -oyaml
      apiVersion: kubevirt.io/v1
      kind: VirtualMachine
      metadata:
      annotations:
      apps.open-cluster-management.io/hosting-subscription: kevin-dr/kev-vm-dvtemplate-odr-metro-2-subscription-1
      apps.open-cluster-management.io/reconcile-option: merge
      kubevirt.io/latest-observed-api-version: v1
      kubevirt.io/storage-observed-api-version: v1
      creationTimestamp: "2023-09-04T16:08:30Z"
      finalizers:

      • kubevirt.io/virtualMachineControllerFinalize
        generation: 13
        labels:
        app: kev-vm-dvtemplate-odr-metro-2
        app.kubernetes.io/part-of: kev-vm-dvtemplate-odr-metro-2
        appname: vm-dvtemplate-odr-metro
        apps.open-cluster-management.io/reconcile-rate: medium
        name: sample-vm
        namespace: kevin-dr
        resourceVersion: "26056866"
        uid: cdbe619e-31f7-4778-a354-a6a2e11cacfd
        spec:
        dataVolumeTemplates:
      • metadata:
        creationTimestamp: null
        labels:
        appname: vm-dvtemplate-odr-metro
        name: sample-vm-pvc
        spec:
        source:
        registry:
        url: docker://quay.io/alitke/cirros:latest
        storage:
        resources:
        requests:
        storage: 2Gi
        storageClassName: ocs-external-storagecluster-ceph-rbd
        running: false
        template:
        metadata:
        annotations:
        vm.kubevirt.io/flavor: small
        vm.kubevirt.io/os: fedora
        vm.kubevirt.io/workload: server
        creationTimestamp: null
        labels:
        kubevirt.io/size: small
        spec:
        architecture: amd64
        domain:
        cpu:
        cores: 1
        sockets: 1
        threads: 1
        devices:
        disks:
      • disk:
        bus: virtio
        name: rootdisk
      • disk: {}
        name: cloudinit
        interfaces:
      • macAddress: 02:69:36:00:00:00
        masquerade: {}
        model: virtio
        name: default
        networkInterfaceMultiqueue: true
        rng: {}
        features:
        acpi: {}
        machine:
        type: pc-q35-rhel8.6.0
        resources:
        requests:
        memory: 2Gi
        evictionStrategy: LiveMigrate
        networks:
      • name: default
        pod: {}
        terminationGracePeriodSeconds: 180
        volumes:
      • name: rootdisk
        persistentVolumeClaim:
        claimName: sample-vm-pvc
      • cloudInitNoCloud:
        userData: |
        #cloud-config
        user: cirros
        password: drftw!
        chpasswd:
        expire: false
        name: cloudinit
        status:
        conditions:
      • lastProbeTime: "2023-09-06T11:06:40Z"
        lastTransitionTime: "2023-09-06T11:06:40Z"
        message: VMI does not exist
        reason: VMINotExists
        status: "False"
        type: Ready
      • lastProbeTime: null
        lastTransitionTime: null
        status: "True"
        type: LiveMigratable
        desiredGeneration: 13
        observedGeneration: 13
        printableStatus: Stopped
        volumeSnapshotStatuses:
      • enabled: true
        name: rootdisk
      • enabled: false
        name: cloudinit
        reason: Snapshot is not supported for this volumeSource type [cloudinit]

            sgott@redhat.com Stuart Gott
            kgoldbla Kevin Alon Goldblatt
            Kedar Bidarkar Kedar Bidarkar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: