Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-2466

[RHEL9] Corruption of slab_caches on cache shutdown with non-freed objects

    • kernel-5.14.0-376.el9
    • None
    • Medium
    • ZStream
    • 1
    • sst_kernel_ft
    • 10
    • 12
    • 5
    • QE ack, Dev ack
    • False
    • Hide

      None

      Show
      None
    • No
    • Red Hat Enterprise Linux
    • CK-9.4.0
    • Approved Blocker
    • All
    • Linux
    • None

      (Based on support case 03604432 from Veritas.)

      Hello.

      Consider this code:

      #include <linux/module.h>
      #include <linux/kernel.h>
      #include <linux/slab.h>
      
      static struct kmem_cache *hello_cache;
      static int* data;
      
      static int __init hello_module_init(void)
      {
      	pr_info("hello: loading module\n");
      
      	hello_cache = kmem_cache_create("hello_cache", sizeof(*data), 0, SLAB_HWCACHE_ALIGN, NULL);
      
      	if (!hello_cache)
      	{
      		pr_err("hello: kmem_cache_create failed\n");
      		return -ENOMEM;
      	}
      
      	pr_info("hello: kmem_cache_create succeeded\n");
      
      	data = kmem_cache_alloc(hello_cache, GFP_KERNEL);
      	if (!data)
      	{
      		pr_err("hello: kmem_cache_alloc failed\n");
      		kmem_cache_destroy(hello_cache);
      		return -ENOMEM;
      	}
      
      	pr_info("hello: kmem_cache_alloc succeeded\n");
      
      	pr_info("hello: module loaded\n");
      
      	return 0;
      }
      
      static void __exit hello_module_exit(void)
      {
      	pr_info("hello: unloading module\n");
      
      	// INTENTIONAL BUG IF COMMENTED
      	// kmem_cache_free(hello_cache, data);
      	// pr_info("hello: kmem_cache_free succeeded\n");
      
      	if (hello_cache)
      	{
      		kmem_cache_destroy(hello_cache);
      		pr_info("hello: kmem_cache_destroy succeeded\n");
      	}
      
      	pr_info("hello: module unloaded\n");
      }
      
      module_init(hello_module_init);
      module_exit(hello_module_exit);
      MODULE_LICENSE("GPL");
      
      obj-m = hello.o
      all:
      	make -C /lib/modules/$(shell uname -r)/build/ M=$(PWD) modules
      clean:
      	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
      

      Here, a simple kernel module does the following. On inserting the module:

      1. creates hello_cache SLAB cache
      2. allocates one object in it

      On removing the module:

      1. "forgets" (this bug is intentional to demonstrate the issue!) to free up the previously allocated object
      2. destroys the cache

      Once the module is inserted for the first time, and then removed for the first time, the following happens:

      [ 2696.143142] hello: loading out-of-tree module taints kernel.
      [ 2696.143174] hello: module verification failed: signature and/or required key missing - tainting kernel
      [ 2696.143635] hello: loading module
      [ 2696.143656] hello: kmem_cache_create succeeded
      [ 2696.143658] hello: kmem_cache_alloc succeeded
      [ 2696.143658] hello: module loaded
      [ 2701.191747] hello: unloading module
      [ 2701.191764] =============================================================================
      [ 2701.191765] BUG hello_cache (Tainted: G           OE     -------  --- ): Objects remaining in hello_cache on __kmem_cache_shutdown()
      [ 2701.191766] -----------------------------------------------------------------------------
      [ 2701.191766] 
      [ 2701.191766] Slab 0x00000000d5cdebc3 objects=512 used=1 fp=0x000000003bce40eb flags=0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff)
      …
      [ 2701.191775] Call Trace:
      [ 2701.191781]  <TASK>
      [ 2701.191784]  dump_stack_lvl+0x34/0x48
      [ 2701.191797]  slab_err.cold+0x53/0x67
      [ 2701.191802]  __kmem_cache_shutdown+0x16a/0x310
      [ 2701.191806]  kmem_cache_destroy+0x51/0x160
      [ 2701.191812]  hello_module_exit+0x1d/0xff0 [hello]
      [ 2701.191814]  __do_sys_delete_module.constprop.0+0x175/0x280
      [ 2701.191817]  do_syscall_64+0x59/0x90
      …
      [ 2701.191877] Object 0x000000005f5f26b3 @offset=3216
      …
      [ 2701.191879] kmem_cache_destroy hello_cache: Slab cache still has objects when called from hello_module_exit+0x1d/0xff0 [hello]
      [ 2701.191913] WARNING: CPU: 0 PID: 1909 at mm/slab_common.c:492 kmem_cache_destroy+0x14d/0x160
      …
      [ 2701.227141] RIP: 0010:kmem_cache_destroy+0x14d/0x160
      …
      [ 2701.238597] Call Trace:
      [ 2701.239217]  <TASK>
      [ 2701.248572]  hello_module_exit+0x1d/0xff0 [hello]
      [ 2701.249341]  __do_sys_delete_module.constprop.0+0x175/0x280
      [ 2701.250264]  do_syscall_64+0x59/0x90
      …
      [ 2701.271529] hello: kmem_cache_destroy succeeded
      [ 2701.271530] hello: module unloaded
      

      This is totally right, correct and expected.

      What's not expected is what happens once the module is inserted for the second time:

      [ 2841.315510] hello: loading module
      [ 2841.315527] list_add corruption. next->prev should be prev (ffffffff9a865ba0), but was 0000000000000000. (next=ffff8eef036eca68).
      [ 2841.315546] ------------[ cut here ]------------
      [ 2841.315546] kernel BUG at lib/list_debug.c:23!
      …
      [ 2841.321557] RIP: 0010:__list_add_valid.cold+0xf/0x3f
      …
      [ 2841.332838] Call Trace:
      [ 2841.333380]  <TASK>
      [ 2841.343728]  kmem_cache_create_usercopy+0x1a5/0x2c0
      [ 2841.345126]  kmem_cache_create+0x12/0x20
      [ 2841.345753]  hello_module_init+0x2c/0xff0 [hello]
      [ 2841.346407]  do_one_initcall+0x41/0x210
      [ 2841.347660]  do_init_module+0x5c/0x270
      [ 2841.348262]  __do_sys_finit_module+0xae/0x110
      [ 2841.348894]  do_syscall_64+0x59/0x90
      …
      

      Here, 0xffffffff9a865ba0 is slab_caches list meaning that if a SLAB cache is destroyed while there are still objects in it, the slab_caches list gets corrupted.

      This is also visible in /proc/slabinfo after first module removal (and before second module insertion):

      # head /proc/slabinfo
      slabinfo - version: 2.1
      # name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
      hello_ca�|U��V41       0      0      8  512    1 : tunables    0    0    0 : slabdata      0      0      0
      nf_conntrack_expect      0      0    232   17    1 : tunables    0    0    0 : slabdata      0      0      0
      …
      

      I've checked upstream kernel v6.5.1 with Fedora, and it behaves the same.

      It seems this behaviour was introduced by the upstream commit 0495e337b7039191dfce6e03f5f830454b1fae6b which got backported into RHEL 9.2 as 872b93e25e8ab566014eee3f07215ef2ec88151a.

      The question is: is this expected, or kernel should be a bit more resilient to bugs like this?

      Thanks.

        1. Makefile
          0.1 kB
        2. hello.c
          1 kB

            raquini@redhat.com Rafael Aquini
            rhn-support-onatalen Oleksandr Natalenko
            Memory Management Maintainers Memory Management Maintainers
            Li Wang Li Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: