Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-68997

kernel: Corruption of AArch64 SVE state

    • Yes
    • Moderate
    • rhel-sst-arch-hw
    • ssg_platform_enablement
    • 2
    • False
    • Hide

      None

      Show
      None
    • No
    • Red Hat Enterprise Linux
    • None
    • None
    • None
    • Unspecified Release Note Type - Unknown
    • aarch64
    • None

      When running a guest on A64FX, we hit a corruption after several guest reboots. We have at least 2 different reproducers. One where the corruption after more than 1d (RHEL-22598) and the other one (RHEL-67106) where we hit it generally within tens of minutes. After further debug at QEMU level we identified a code section that may be the cause of the corruption in flatview_insert(). If we comment out the memmove call and replace it by individual cell copies, we do not hit the issue anymore.

      static void flatview_insert(FlatView *view, unsigned pos, FlatRange *range)
      {
          int i = view->nr;
          if (view->nr == view->nr_allocated)
      
      {         view->nr_allocated = MAX(2 * view->nr, 10);         view->ranges = g_realloc(view->ranges,                                     view->nr_allocated * sizeof(*view->ranges));     }
      
      #if 0
          memmove(view->ranges + pos + 1, view->ranges + pos,
                  (view->nr - pos) * sizeof(FlatRange));
      #else
           while (i > pos)
      
      {         view->ranges[i] = view->ranges[i - 1];         i--;     }
      
      #endif
          view->ranges[pos] = *range;
          memory_region_ref(range->mr);
          ++view->nr;
      }
      

      So we wonder whether there could be something wrong with the memmove implementation on this A64FX HW. After a dicussion with fweimer@redhat.com, it looks the rhel9 code for the memset/memcpy/memmove selectors in glibc in RHEL 9 check midr for A64FX.

      So this Jira ticket is a request to produce a test build with the A64FX string routines ripped out so that glibc would use the generic implementation, just to see if it removes the issue.

              arch-hw-aarch64-triage Arch HW AArch64 Triage
              eauger Eric Auger
              Arch HW AArch64 Triage Arch HW AArch64 Triage
              Eddie Kovsky Eddie Kovsky
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

                Created:
                Updated: