Description of problem:
`search-postgres` persists in CrashLoopBackOff when being included in the ACM CR while the Hub cluster has the hugepages enabled
Version-Release number of selected component (if applicable):
$ omc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE multicluster-engine multicluster-engine.v2.4.2 multicluster engine for Kubernetes 2.4.2 multicluster-engine.v2.4.1 Succeeded open-cluster-management advanced-cluster-management.v2.9.1 Advanced Cluster Management for Kubernetes 2.9.1 advanced-cluster-management.v2.9.0 Succeeded openshift-file-integrity file-integrity-operator.v1.3.3 File Integrity Operator 1.3.3 Succeeded openshift-logging cluster-logging.v5.8.1 Red Hat OpenShift Logging 5.8.1 cluster-logging.v5.8.0 Succeeded openshift-operator-lifecycle-manager packageserver Package Server 0.19.0 Succeeded openshift-storage lvms-operator.v4.12.2 LVM Storage 4.12.2 lvms-operator.v4.12.1 Succeeded
How reproducible:
100%
Steps to Reproduce:
eploy OCPv4.12.31:
$ omc get nodes NAME STATUS ROLES AGE VERSION master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net Ready control-plane,master,worker 1h v1.25.12+26bab08 $ omc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.31 True False 42m Cluster version is 4.12.31 $ omc describe nodes master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net Name: master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net Roles: control-plane,master,worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net kubernetes.io/os=linux node-role.kubernetes.io/control-plane= node-role.kubernetes.io/master= node-role.kubernetes.io/worker= node.openshift.io/os_id=rhcos topology.topolvm.io/node=master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net Annotations: capacity.topolvm.io/00default: 0 capacity.topolvm.io/vg1: 17137590599680 csi.volume.kubernetes.io/nodeid: {"topolvm.io":"master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net"} k8s.ovn.org/host-addresses: ["10.40.22.126"] k8s.ovn.org/l3-gateway-config: {"default":{"mode":"shared","interface-id":"br-ex_master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net","mac-address":"40:a6:b7:78:43:d0","ip-... k8s.ovn.org/node-chassis-id: 4e0ab6a1-862d-47dc-a700-ac5489d75540 k8s.ovn.org/node-gateway-router-lrp-ifaddr: {"ipv4":"100.64.0.2/16"} k8s.ovn.org/node-mgmt-port-mac-address: 4e:ea:8b:15:15:7b k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.40.22.126/30"} k8s.ovn.org/node-subnets: {"default":"172.21.0.0/23"} machineconfiguration.openshift.io/controlPlaneTopology: SingleReplica machineconfiguration.openshift.io/currentConfig: rendered-master-b55e7bf2ecb759242db0e91692a57364 machineconfiguration.openshift.io/desiredConfig: rendered-master-b55e7bf2ecb759242db0e91692a57364 machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-master-b55e7bf2ecb759242db0e91692a57364 machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-master-b55e7bf2ecb759242db0e91692a57364 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/ssh: accessed machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Tue, 02 Jan 2024 08:45:38 +0100 Taints: <none> Unschedulable: false Lease: Failed to get lease: leases.coordination.k8s.io "master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net" not found Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Tue, 02 Jan 2024 10:07:35 +0100 Tue, 02 Jan 2024 08:45:38 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 02 Jan 2024 10:07:35 +0100 Tue, 02 Jan 2024 08:45:38 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 02 Jan 2024 10:07:35 +0100 Tue, 02 Jan 2024 08:45:38 +0100 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Tue, 02 Jan 2024 10:07:35 +0100 Tue, 02 Jan 2024 08:54:29 +0100 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.40.22.126 Hostname: master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net Capacity: cpu: 64 ephemeral-storage: 468315972Ki hugepages-1Gi: 21Gi hugepages-2Mi: 0 memory: 263485180Ki pods: 250 Allocatable: cpu: 60 ephemeral-storage: 454504035381 hugepages-1Gi: 21Gi hugepages-2Mi: 0 memory: 240236284Ki pods: 250
- Install the operators:
$ omc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE multicluster-engine multicluster-engine.v2.4.2 multicluster engine for Kubernetes 2.4.2 multicluster-engine.v2.4.1 Succeeded open-cluster-management advanced-cluster-management.v2.9.1 Advanced Cluster Management for Kubernetes 2.9.1 advanced-cluster-management.v2.9.0 Succeeded openshift-file-integrity file-integrity-operator.v1.3.3 File Integrity Operator 1.3.3 Succeeded openshift-logging cluster-logging.v5.8.1 Red Hat OpenShift Logging 5.8.1 cluster-logging.v5.8.0 Succeeded openshift-operator-lifecycle-manager packageserver Package Server 0.19.0 Succeeded openshift-storage lvms-operator.v4.12.2 LVM Storage 4.12.2 lvms-operator.v4.12.1 Succeeded $ omc get pods -n open-cluster-management NAME READY STATUS RESTARTS AGE cluster-permission-9d767cf4d-lsd26 1/1 Running 0 29m console-chart-console-v2-5f7b6bb59b-2g472 1/1 Running 0 29m console-chart-console-v2-5f7b6bb59b-d5rn2 1/1 Running 0 29m grc-policy-addon-controller-99765dcfc-5gpld 1/1 Running 0 29m grc-policy-addon-controller-99765dcfc-wbsvg 1/1 Running 0 29m grc-policy-propagator-8488d9d58c-lsn2s 2/2 Running 0 29m grc-policy-propagator-8488d9d58c-nqdsm 2/2 Running 0 29m insights-client-69c669d995-jqhfx 1/1 Running 0 29m insights-metrics-684855d446-mkmnp 2/2 Running 0 29m klusterlet-addon-controller-v2-76965b8d84-5rwl2 1/1 Running 0 29m klusterlet-addon-controller-v2-76965b8d84-96rpg 1/1 Running 0 29m multicluster-integrations-75cfdc4c69-bdt8j 3/3 Running 1 32m multicluster-observability-operator-79f5566cc9-b5c9p 1/1 Running 0 29m multicluster-operators-application-85cfdfb4f5-qzn7b 3/3 Running 2 32m multicluster-operators-channel-5496c6548b-nwxbr 1/1 Running 1 32m multicluster-operators-hub-subscription-68944cc796-5gszd 1/1 Running 1 32m multicluster-operators-standalone-subscription-6d85f8f9ff-45trl 1/1 Running 0 32m multicluster-operators-subscription-report-84f45d9d4c-2hbxq 1/1 Running 0 32m multiclusterhub-operator-7c68c6b9c5-qwgjj 1/1 Running 0 33m search-api-5855458b5d-tt2kx 1/1 Running 0 29m search-collector-5464f5fb9d-nnckl 1/1 Running 0 29m search-indexer-57698955d9-cvfbr 0/1 CrashLoopBackOff 11 29m search-postgres-c4846dc79-rd6bm 0/1 CrashLoopBackOff 9 29m search-v2-operator-controller-manager-c8dc4dfdf-r6t7f 2/2 Running 0 29m submariner-addon-db5c4cc9-2v57k 1/1 Running 0 29m volsync-addon-controller-5986676985-8vnhz 1/1 Running 0 29m
Actual results:
Cluster must-gather: https://drive.google.com/drive/folders/1VeTa4kW7ItqiXQRPLKAfuwBhMCATSWLt?usp=sharing
Analyzing the logs of the `search-postgres-c4846dc79-rd6bm` :
$ omc logs -n open-cluster-management search-postgres-c4846dc79-rd6bm -c search-postgres 2024-01-02T09:07:46.872623275Z The files belonging to this database system will be owned by user "postgres". 2024-01-02T09:07:46.872623275Z This user must also own the server process. 2024-01-02T09:07:46.872623275Z 2024-01-02T09:07:46.872719820Z The database cluster will be initialized with locale "en_US.utf8". 2024-01-02T09:07:46.872719820Z The default database encoding has accordingly been set to "UTF8". 2024-01-02T09:07:46.872719820Z The default text search configuration will be set to "english". 2024-01-02T09:07:46.872719820Z 2024-01-02T09:07:46.872719820Z Data page checksums are disabled. 2024-01-02T09:07:46.872719820Z 2024-01-02T09:07:46.872719820Z fixing permissions on existing directory /var/lib/pgsql/data/userdata ... ok 2024-01-02T09:07:46.872732106Z creating subdirectories ... ok 2024-01-02T09:07:46.873182515Z selecting dynamic shared memory implementation ... posix 2024-01-02T09:07:46.873218345Z selecting default max_connections ... sh: line 1: 22 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:07:48.833791125Z sh: line 1: 24 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=50 -c shared_buffers=500 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:07:49.871619862Z sh: line 1: 26 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=40 -c shared_buffers=400 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:07:50.887143490Z sh: line 1: 28 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=30 -c shared_buffers=300 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:07:51.837179794Z sh: line 1: 30 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:07:51.837382918Z 20 2024-01-02T09:07:51.837390275Z selecting default shared_buffers ... sh: line 1: 32 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=16384 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:07:53.873834686Z sh: line 1: 34 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=8192 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:07:54.830798901Z sh: line 1: 36 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=4096 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:07:55.848921479Z sh: line 1: 45 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=3584 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:07:56.793309195Z sh: line 1: 47 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=3072 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:07:57.829888466Z sh: line 1: 49 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=2560 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:07:58.784799176Z sh: line 1: 51 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=2048 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:07:59.786337672Z sh: line 1: 53 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=1536 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:08:00.966317691Z sh: line 1: 55 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=1000 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:08:01.910592562Z sh: line 1: 57 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=900 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:08:02.875530817Z sh: line 1: 59 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=800 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:08:03.838026157Z sh: line 1: 61 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=700 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:08:04.892152297Z sh: line 1: 63 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=600 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:08:05.945332685Z sh: line 1: 72 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=500 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:08:06.937940743Z sh: line 1: 74 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=400 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:08:08.028957223Z sh: line 1: 76 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=300 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:08:08.971817245Z sh: line 1: 78 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:08:10.075753837Z sh: line 1: 80 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=100 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:08:11.120469585Z sh: line 1: 82 Bus error (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=50 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1 2024-01-02T09:08:11.120662785Z 400kB 2024-01-02T09:08:11.120675663Z selecting default time zone ... Etc/UTC 2024-01-02T09:08:11.143223795Z creating configuration files ... ok 2024-01-02T09:08:11.144227310Z running bootstrap script ... child process was terminated by signal 7: Bus error 2024-01-02T09:08:12.187073431Z initdb: removing contents of data directory "/var/lib/pgsql/data/userdata"
Expected results:
Expected result would be to have `search-indexer-` and `search-postgres-` pods in Running state and `multiclusterhub` from permanent `Installing` state to `Running` state.
Additional info:
Having the hugepages disabled on the SNO hub:
HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 0 kB
the `multiclusterhub` progressed to `Running` state:
NAMESPACE NAME STATUS AGE open-cluster-management multiclusterhub Installing 5m54s NAMESPACE NAME STATUS AGE open-cluster-management multiclusterhub Running 6m4s
- links to
-
RHSA-2024:126795 Red Hat Advanced Cluster Management 2.9.3 security and bug fix container updates