Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: rhel-10.0.beta
Affects Version/s: rhel-10.0.beta
Component/s: glibc
Labels:
- pt-libraries-evaluated

Fixed in Build:
glibc-2.39-7.el10
Regression:
None
Severity:
None
Epic Link:
RHEL-36007
Keywords:

Patch
sprint_count:
1

Pool Team:

rhel-sst-pt-libraries
Sub-System Group:

ssg_platform_tools

Internal Target Milestone:
13
Story Points:
3
Product Documentation Required:
Yes
Products:

Red Hat Enterprise Linux
Sprint:
SST PT Libraries Sprint 4

Preliminary Testing:
Pass
Testable Builds:
glibc-2.39-7.el10
Errata Link:
https://errata.engineering.redhat.com/advisory/132197
Test Coverage:
None

Release Note Type:
Enhancement
Release Note Text:

Hide
.Optimization of AMD Zen 3 and Zen 4 performance in `glibc`

Previously, AMD Zen 3 and Zen 4 processors sometimes used the Enhanced Repeat Move String (ERMS) version of the `memcpy` and `memmove` library routines regardless of the most optimal choice. With this update to `glibc`, AMD Zen 3 and Zen 4 processors use the most optimal versions of `memcpy` and `memmove`.

Show
.Optimization of AMD Zen 3 and Zen 4 performance in `glibc` Previously, AMD Zen 3 and Zen 4 processors sometimes used the Enhanced Repeat Move String (ERMS) version of the `memcpy` and `memmove` library routines regardless of the most optimal choice. With this update to `glibc`, AMD Zen 3 and Zen 4 processors use the most optimal versions of `memcpy` and `memmove`.
Release Note Status:
Done

Experience:
Architecture:

x86_64

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

Upstream fixed a string function performance issue on certain AMD CPUs.

commit 491e55beab7457ed310a4a47496f4a333c5d1032
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Feb 8 10:08:40 2024 -0300

    x86: Expand the comment on when REP STOSB is used on memset
    
    Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

commit 272708884cb750f12f5c74a00e6620c19dc6d567
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Feb 8 10:08:39 2024 -0300

    x86: Do not prefer ERMS for memset on Zen3+
    
    For AMD Zen3+ architecture, the performance of the vectorized loop is
    slightly better than ERMS.
    
    Checked on x86_64-linux-gnu on Zen3.
    Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

commit 0c0d39fe4aeb0f69b26e76337c5dfd5530d5d44e
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Feb 8 10:08:38 2024 -0300

    x86: Fix Zen3/Zen4 ERMS selection (BZ 30994)
    
    The REP MOVSB usage on memcpy/memmove does not show much performance
    improvement on Zen3/Zen4 cores compared to the vectorized loops.  Also,
    as from BZ 30994, if the source is aligned and the destination is not
    the performance can be 20x slower.
    
    The performance difference is noticeable with small buffer sizes, closer
    to the lower bounds limits when memcpy/memmove starts to use ERMS.  The
    performance of REP MOVSB is similar to vectorized instruction on the
    size limit (the L2 cache).  Also, there is no drawback to multiple cores
    sharing the cache.
    
    Checked on x86_64-linux-gnu on Zen3.
    Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

is blocked by

RHEL-25850 glibc: Import bug fixes from glibc-2.39 upstream (snapshot 1)

Closed

links to

Bug 30994 - REP MOVSB performance suffers from page aliasing on Zen 4

RHBA-2024:132197 glibc update

Assignee:: Martin Coufal

Reporter:: Florian Weimer

Developer:: Platform Tools - Libraries Bot

QA Contact:: Martin Coufal

Doc Contact:: Tomas Capek

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Created:: 2024/02/14 3:41 PM

Updated:: 2024/12/19 3:34 PM

Resolved:: 2024/12/19 3:34 PM

Target end:: 2024/05/27

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates