Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-1967

Frequent kernel panic on edpm virtual baremetal nodes (possibly after enabling FIPS)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • 2023Q4
    • edpm-image-builder
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • ?
    • ?
    • No
    • 2023Q4
    • Moderate

      Number of zuul baremetal jobs are failing during dataplane deployment running edpm ansible jobs.  They fail at different steps of the deployment as seen in 1 and 2.

       

      Looking at the console logs there is kernel panic which seems to be FIPS related. We've enabled FIPs recently.

      [  541.408389] Kernel panic - not syncing: Jitter RNG permanent health test failure
      [  541.408389] CPU: 0 PID: 30478 Comm: systemctl Not tainted 5.14.0-383.el9.x86_64 #1
      [  541.408389] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20230524-3.el9 05/24/2023
      [  541.408389] Call Trace:
      [  541.408389]  <TASK>
      [  541.408389]  dump_stack_lvl+0x34/0x48
      [  541.408826]  panic+0xfd/0x2f7
      [  541.408829]  jent_kcapi_random.cold+0x15/0x43
      [  541.408831]  drbg_seed+0x10a/0x440
      [  541.408835]  drbg_generate+0xb9/0x320
      [  541.408837]  ? drbg_hmac_generate+0x284/0x2f0
      [  541.408839]  drbg_kcapi_random+0xd3/0x120
      [  541.408841]  ? release_pages+0x16a/0x4c0
      [  541.408844]  crypto_devrandom_read_iter+0xb6/0x220
      [  541.408859]  ? _copy_to_iter+0x1d3/0x620
      [  541.408859]  ? crypto_devrandom_read_iter+0x149/0x220
      [  541.408859]  __do_sys_getrandom+0x9d/0x140
      [  541.408859]  do_syscall_64+0x5c/0x90
      [  541.408859]  ? __do_sys_getrandom+0xa9/0x140
      [  541.408859]  ? switch_fpu_return+0x4c/0xd0
      [  541.408859]  ? exit_to_user_mode_prepare+0xec/0x100
      [  541.408859]  ? syscall_exit_to_user_mode+0x12/0x30
      [  541.408859]  ? do_syscall_64+0x69/0x90
      [  541.408859]  ? exc_page_fault+0x62/0x150
      [  541.408859]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [  541.408859] RIP: 0033:0x7f3af8a5ac87
      [  541.408859] Code: be 0b 00 00 00 eb b2 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 3e 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      [  541.408859] RSP: 002b:00007ffff38fb758 EFLAGS: 00000246 ORIG_RAX: 000000000000013e
      [  541.408859] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007f3af8a5ac87
      [  541.408859] RDX: 0000000000000002 RSI: 0000000000000010 RDI: 00007ffff38fb770
      [  541.408859] RBP: 00007f3af8dfb180 R08: 00005646c9deee60 R09: 0000000000000000
      [  541.408859] R10: 0000000000020000 R11: 0000000000000246 R12: 0000000000000010
      [  541.408859] R13: 0000000000000010 R14: 00007ffff38fb770 R15: 0000000000000020
      [  541.408859]  </TASK>
      [  601.477891] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
      [  601.477897] rcu:     0-...0: (20 ticks this GP) idle=d12c/1/0x4000000000000000 softirq=56206/56211 fqs=11863
      [  601.477904] rcu:     (detected by 2, t=60091 jiffies, g=163213, q=3619 ncpus=6)
      [  601.477908] Sending NMI from CPU 2 to CPUs 0:
      [  541.408859] NMI backtrace for cpu 0
      [  541.408859] CPU: 0 PID: 30478 Comm: systemctl Not tainted 5.14.0-383.el9.x86_64 #1
      [  541.408859] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20230524-3.el9 05/24/2023
      [  541.408859] RIP: 0010:io_serial_in+0x15/0x20
      [  541.408859] Code: 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 0f b6 8f b9 00 00 00 0f b7 57 08 d3 e6 01 f2 ec <0f> b6 c0 e9 f3 2c 4d 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90
      [  541.408859] RSP: 0018:ffffade688bcb870 EFLAGS: 00000006
      [  541.408859] RAX: ffffffff8c5a5f05 RBX: 0000000000000046 RCX: 0000000000000000
      [  541.408859] RDX: 00000000000003f9 RSI: 0000000000000001 RDI: ffffffff8eb72a20
      [  541.408859] RBP: 0000000000000100 R08: 0000000000000001 R09: 6372203a4f464e49
      [  541.408859] R10: 705f756372203a4f R11: 464e49203a756372 R12: 0000000000000045
      [  541.408859] R13: ffffffff8e88e5c0 R14: ffffffff8e88e5c0 R15: ffffffff8eb72a20
      [  541.408859] FS:  00007f3af883db40(0000) GS:ffff9d37b7c00000(0000) knlGS:0000000000000000
      [  541.408859] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  541.408859] CR2: 00007f3af8dff04c CR3: 0000000174778000 CR4: 0000000000350ef0
      [  541.408859] Call Trace:
      [  541.408859]  <NMI>
      [  541.408859]  ? show_trace_log_lvl+0x1c4/0x2df
      [  541.408859]  ? show_trace_log_lvl+0x1c4/0x2df
      [  541.408859]  ? serial8250_console_write+0x389/0x4e0
      [  541.408859]  ? nmi_cpu_backtrace.cold+0x1b/0x70
      [  541.408859]  ? nmi_cpu_backtrace_handler+0xd/0x20
      [  541.408859]  ? nmi_handle+0x5e/0x120
      [  541.408859]  ? default_do_nmi+0x40/0x130
      [  541.408859]  ? exc_nmi+0x111/0x140
      [  541.408859]  ? end_repeat_nmi+0x16/0x67
      [  541.408859]  ? mem32_serial_in+0x5/0x20
      [  541.408859]  ? io_serial_in+0x15/0x20
      [  541.408859]  ? io_serial_in+0x15/0x20
      [  541.408859]  ? io_serial_in+0x15/0x20
      [  541.408859]  </NMI>
      [  541.408859]  <TASK>
      [  541.408859]  serial8250_console_write+0x389/0x4e0
      [  541.408859]  __console_emit_next_record+0x215/0x3e0
      [  541.408859]  console_unlock+0x213/0x320
      [  541.408859]  panic+0x12f/0x2f7
      [  541.408859]  jent_kcapi_random.cold+0x15/0x43
      [  541.408859]  drbg_seed+0x10a/0x440
      [  541.408859]  drbg_generate+0xb9/0x320
      [  541.408859]  ? drbg_hmac_generate+0x284/0x2f0
      [  541.408859]  drbg_kcapi_random+0xd3/0x120
      [  541.408859]  ? release_pages+0x16a/0x4c0
      [  541.408859]  crypto_devrandom_read_iter+0xb6/0x220
      [  541.408859]  ? _copy_to_iter+0x1d3/0x620
      [  541.408859]  ? crypto_devrandom_read_iter+0x149/0x220
      [  541.408859]  __do_sys_getrandom+0x9d/0x140
      [  541.408859]  do_syscall_64+0x5c/0x90
      [  541.408859]  ? __do_sys_getrandom+0xa9/0x140
      [  541.408859]  ? switch_fpu_return+0x4c/0xd0
      [  541.408859]  ? exit_to_user_mode_prepare+0xec/0x100
      [  541.408859]  ? syscall_exit_to_user_mode+0x12/0x30
      [  541.408859]  ? do_syscall_64+0x69/0x90
      [  541.408859]  ? exc_page_fault+0x62/0x150
      [  541.408859]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [  541.408859] RIP: 0033:0x7f3af8a5ac87
      [  541.408859] Code: be 0b 00 00 00 eb b2 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 3e 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      [  541.408859] RSP: 002b:00007ffff38fb758 EFLAGS: 00000246 ORIG_RAX: 000000000000013e
      [  541.408859] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007f3af8a5ac87
      [  541.408859] RDX: 0000000000000002 RSI: 0000000000000010 RDI: 00007ffff38fb770
      [  541.408859] RBP: 00007f3af8dfb180 R08: 00005646c9deee60 R09: 0000000000000000
      [  541.408859] R10: 0000000000020000 R11: 0000000000000246 R12: 0000000000000010
      [  541.408859] R13: 0000000000000010 R14: 00007ffff38fb770 R15: 0000000000000020
      [  541.408859]  </TASK>
      [  541.408859] Kernel Offset: 0xae00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [  541.408859] {}{}[ end Kernel panic - not syncing: Jitter RNG permanent health test failure ]{}{}

            rhn-engineering-sbaker Steve Baker
            rhn-support-ramishra Rabi Mishra
            rhos-dfg-hardprov
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: