-
Story
-
Resolution: Won't Do
-
Undefined
-
None
-
None
-
None
-
None
-
None
-
None
-
ssg_core_services
-
None
-
False
-
False
-
-
None
-
None
-
None
-
None
-
None
CentOS zlib package (https://gitlab.com/redhat/centos-stream/rpms/zlib) seems to support IBM Z compression optimizations patches on top of canonical zlib.
There is a huge potential for major performance gains in CentOS package for both x86-64 and Arm architectures if SIMD optimizations are added on the current package.
As an example, these are the reported numbers for the zlib package (1.2.11) shipped
on CentOS stream 8, running zlib_bench (https://source.chromium.org/chromium/chromium/src/+/main:third_party/zlib/contrib/bench/zlib_bench.cc):
[acavalca@spr3 ~]$ hostnamectl
Static hostname: spr3.ra.intel.com
Icon name: computer-server
Chassis: server
Machine ID: 84c65e94b8a44c3abcb440894280dbd1
Boot ID: 3b0cf365c9444f238ed07c95dad9a97d
Operating System: CentOS Stream 8
CPE OS Name: cpe:/o:centos:centos:8
Kernel: Linux 4.18.0-497.el8.x86_64
Architecture: x86-64
[acavalca@spr3 ~]$ ldd ./zlib_bench_system
linux-vdso.so.1 (0x00007ffd2a7a5000)
libz.so.1 => /lib64/libz.so.1 (0x00007f316156e000)
libstdc+.so.6 => /lib64/libstdc+.so.6 (0x00007f31611d9000)
libm.so.6 => /lib64/libm.so.6 (0x00007f3160e57000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f3160c3f000)
libc.so.6 => /lib64/libc.so.6 (0x00007f316087a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3161786000)
[acavalca@spr3 ~]$ rpm -qa | grep zlib
zlib-1.2.11-25.el8.x86_64
zlib-devel-1.2.11-25.el8.x86_64
[acavalca@spr3 ~]$ ./zlib_bench_system gzip corpus/flex/*
corpus/flex/baddata1.snappy :
GZIP: [b 1M] bytes 27512 -> 22920 83.31% comp 45.1 ( 45.3) MB/s uncomp 165.9 (166.2) MB/s
corpus/flex/geo.protodata :
GZIP: [b 1M] bytes 118588 -> 15143 12.77% comp 100.4 (100.6) MB/s uncomp 568.5 (570.3) MB/s
corpus/flex/html_x_4 :
GZIP: [b 1M] bytes 409600 -> 53299 13.01% comp 67.3 ( 67.4) MB/s uncomp 466.1 (466.5) MB/s
The three files above come from the snappy data corpus (https://github.com/google/snappy/tree/main/testdata)
and have varied entropy features which makes for a draft overview of expected performance.
The benchmark is running in a Xeon 4th gen processor (Platinum 8480).
Now the reported numbers for Chromium zlib:
[acavalca@spr3 ~]$ ./chromium-zlib/tot/zlib_bench gzip corpus/flex/*
/home/acavalca/corpus/flex/baddata1.snappy :
GZIP: [b 1M] bytes 27512 -> 23255 84.53% comp 75.3 ( 75.8) MB/s uncomp 381.7 (383.0) MB/s
/home/acavalca/corpus/flex/geo.protodata :
GZIP: [b 1M] bytes 118588 -> 15178 12.80% comp 171.1 (171.6) MB/s uncomp 2339.5 (2401.7) MB/s
/home/acavalca/corpus/flex/html_x_4 :
GZIP: [b 1M] bytes 409600 -> 53243 13.00% comp 117.6 (117.8) MB/s uncomp 1705.3 (1708.0) MB/s
And for Cloudflare zlib:
[acavalca@spr3 ~]$ ./cloudflare-zlib/zlib_bench gzip corpus/flex/*
corpus/flex/baddata1.snappy :
GZIP: [b 1M] bytes 27512 -> 23255 84.5% comp 84.7 ( 84.8) MB/s uncomp 300.7 (301.0) MB/s
corpus/flex/geo.protodata :
GZIP: [b 1M] bytes 118588 -> 15178 12.8% comp 200.8 (201.6) MB/s uncomp 1934.7 (1939.3) MB/s
corpus/flex/html_x_4 :
GZIP: [b 1M] bytes 409600 -> 53246 13.0% comp 139.0 (139.1) MB/s uncomp 1449.1 (1481.8) MB/s
The potential for decompression gains are over 3x (i.e. (381.5 +
2325.4 + 1675.4) / (166.6 + 570.7 + 466.4) = 3.64) and compression is
2x ((85 + 202.6 + 139.3) / (44.9 + 99.9 + 67.1) = 2.01) for these
small data corpus sample.
I don't have numbers for Arm server grade processors but I would
expect similar gains.
With this small experiment, the potential for considerable performance gains should be clear and given that IBM Z specific patches are currently maintained on CentOS zlib, it seems reasonable to follow a similar approach to other CPU architectures.