Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-16797

[ZTP- SNO: Telco profile] KDUMP files not generated after crash

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • 4.12.z
    • GitOps ZTP
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • No
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Performing the steps to produce a crash in the kernel does not generate the expected KDUMP files.
      
      SNO installed ZTP and applying a Telco Profile.
      
      

      Version-Release number of selected component (if applicable):

      Issue seen in 4.12.25 and 4.12.26.
      
      ztp-site-generator image:
      http://registry.redhat.io/openshift4/ztp-site-generate-rhel8:4.12.3
      and also tested with 
      http://registry.redhat.io/openshift4/ztp-site-generate-rhel8:4.12.1
      
      

      How reproducible:

      100%
      
      

      Steps to Reproduce:

      1. Install SNO cluster version 4.12.25
      2. Ensure sysrq is configured with value=1: echo 1 > /proc/sys/kernel/sysrq
      3. echo c > /proc/sysrq-trigger
      4. Wait for node to recover
      

      Actual results:

      /var/crash directory empty
      
      

      Expected results:

      /var/crash directory has core dump files such as: vmcore-dmesg.txt, vmcore, kexec-dmesg.log
      
      

      Additional info:

      System impact: In case of failure in the platform, not all the important data can be recovered. Apart from that, the node can work
      
      The node has the right MachineConfigs after policies being installed:
      
      ```
      oc get -o yaml machineconfig/06-kdump-enable-master
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        annotations:
          ran.openshift.io/ztp-gitops-generated: '{}'
        creationTimestamp: "2023-07-26T07:51:30Z"
        generation: 1
        labels:
          machineconfiguration.openshift.io/role: master
        name: 06-kdump-enable-master
        resourceVersion: "1595"
        uid: 1eeaf075-4d1b-4540-92f3-d1db167fe1d5
      spec:
        config:
          ignition:
            version: 3.2.0
          systemd:
            units:
            - enabled: true
              name: kdump.service
        kernelArguments:
        - crashkernel=512M
      
      
      oc get -o yaml machineconfig/06-kdump-enable-worker
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        annotations:
          ran.openshift.io/ztp-gitops-generated: '{}'
        creationTimestamp: "2023-07-26T07:51:30Z"
        generation: 1
        labels:
          machineconfiguration.openshift.io/role: worker
        name: 06-kdump-enable-worker
        resourceVersion: "1596"
        uid: ddad84a5-3ad3-48dc-9096-9346795f228d
      spec:
        config:
          ignition:
            version: 3.2.0
          systemd:
            units:
            - enabled: true
              name: kdump.service
        kernelArguments:
        - crashkernel=512M
      
      ```
      
      Kdump service is running:
      
      ```
      [core@cloudransno-site3 ~]$ sudo systemctl status kdump.service 
      ● kdump.service - Crash recovery kernel arming
         Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: disabled)
         Active: active (exited) since Wed 2023-07-26 09:19:03 UTC; 23min ago
       Main PID: 5425 (code=exited, status=0/SUCCESS)
          Tasks: 0 (limit: 818202)
         Memory: 0B
            CPU: 0
         CGroup: /system.slice/kdump.service
      
      Jul 26 09:19:03 cloudransno-site3 kdumpctl[5428]: kdump: kexec: loaded kdump kernel
      Jul 26 09:19:03 cloudransno-site3 kdumpctl[5428]: kdump: Starting kdump: [OK]
      Jul 26 09:19:02 cloudransno-site3 systemd[1]: Starting Crash recovery kernel arming...
      Jul 26 09:19:03 cloudransno-site3 systemd[1]: Started Crash recovery kernel arming.
      ```
      
      

              Unassigned Unassigned
              rlopezma@redhat.com Rodrigo Lopez Manrique (Inactive)
              None
              None
              Yang Liu Yang Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: