Uploaded image for project: 'Multiple Architecture Enablement'
  1. Multiple Architecture Enablement
  2. MULTIARCH-4077

Crash kernel not arming successfully

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • 4.14.z
    • Multi-Arch CI
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • x86_64, ppc64le, s390x, aarch64
    • None
    • None
    • None
    • None
    • None
    • None

      In testing my restoration of the PR for the RT team's jobs, I discovered that my s390x node didn't recover after I crashed it. It ssh'd to the other node, and discovered that the kdump service, while enabled, didn't run successfully because there wasn't any memory allocated for the crash kernel.

      I suspect this is a configuration problem. Here is the MCO resource definition:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: worker
        name: 99-worker-kdump
      spec:
        config:
          ignition:
            version: 3.2.0
          storage:
            files:
              - contents:
                  source: data:text/plain;charset=utf-8;base64,cGF0aCAvdmFyL2NyYXNoCmNvcmVfY29sbGVjdG9yIG1ha2VkdW1wZmlsZSAtbCAtLW1lc3NhZ2UtbGV2ZWwgNyAtZCAzMQo=
                mode: 420
                overwrite: true
                path: /etc/kdump.conf
              - contents:
                  source: data:text/plain;charset=utf-8;base64,S0RVTVBfQ09NTUFORExJTkVfUkVNT1ZFPSJodWdlcGFnZXMgaHVnZXBhZ2VzeiBzbHViX2RlYnVnIHF1aWV0IGxvZ19idWZfbGVuIHN3aW90bGIgaHVnZXRsYl9jbWEgaWduaXRpb24uZmlyc3Rib290IHJkLm11bHRpcGF0aD1kZWZhdWx0IgpLRFVNUF9DT01NQU5ETElORV9BUFBFTkQ9ImlycXBvbGwgbWF4Y3B1cz0xIG5vaXJxZGlzdHJpYiByZXNldF9kZXZpY2VzIGNncm91cF9kaXNhYmxlPW1lbW9yeSBudW1hPW9mZiB1ZGV2LmNoaWxkcmVuLW1heD0yIGVoZWEudXNlX21jcz0wIHBhbmljPTEwIGt2bV9jbWFfcmVzdl9yYXRpbz0wIHRyYW5zcGFyZW50X2h1Z2VwYWdlPW5ldmVyIG5vdm1jb3JlZGQgaHVnZXRsYl9jbWE9MCBzcmN1dHJlZS5iaWdfY3B1X2xpbT0wIgpLRVhFQ19BUkdTPSItLWR0LW5vLW9sZC1yb290IC1zIgpLRFVNUF9JTUc9InZtbGludXoiCg==
                mode: 420
                overwrite: true
                path: /etc/sysconfig/kdump
          systemd:
            units:
              - enabled: true
                name: kdump.service
        kernelArguments:
          - crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G"
      

      I discovered this on s390x, but I suspect it affects all jobs using the current kdump steps.

              Unassigned Unassigned
              jpoulin Jeremy Poulin
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: