Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-9186

search-postgres persists in CrashLoopBackOff while being included in the ACM CR with hugepages enabled

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • ACM 2.9.3
    • ACM 2.9.1
    • Search
    • None
    • 1
    • False
    • None
    • False
    • No
    • Search Sprint 2024-4
    • Moderate

      Description of problem:

      `search-postgres` persists in CrashLoopBackOff when being included in the ACM CR while the Hub cluster has the hugepages enabled

      Version-Release number of selected component (if applicable):

       

      $ omc get csv -A
      NAMESPACE                              NAME                                 DISPLAY                                      VERSION   REPLACES                             PHASE
      multicluster-engine                    multicluster-engine.v2.4.2           multicluster engine for Kubernetes           2.4.2     multicluster-engine.v2.4.1           Succeeded
      open-cluster-management                advanced-cluster-management.v2.9.1   Advanced Cluster Management for Kubernetes   2.9.1     advanced-cluster-management.v2.9.0   Succeeded
      openshift-file-integrity               file-integrity-operator.v1.3.3       File Integrity Operator                      1.3.3                                          Succeeded
      openshift-logging                      cluster-logging.v5.8.1               Red Hat OpenShift Logging                    5.8.1     cluster-logging.v5.8.0               Succeeded
      openshift-operator-lifecycle-manager   packageserver                        Package Server                               0.19.0                                         Succeeded
      openshift-storage                      lvms-operator.v4.12.2                LVM Storage                                  4.12.2    lvms-operator.v4.12.1                Succeeded
       

       

       

      How reproducible:

      100%

      Steps to Reproduce:

      1. eploy OCPv4.12.31:

        $ omc get nodes
        NAME                                               STATUS   ROLES                         AGE   VERSION
        master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net   Ready    control-plane,master,worker   1h    v1.25.12+26bab08
        
        $ omc get clusterversion
        NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
        version   4.12.31   True        False         42m     Cluster version is 4.12.31
        
        $ omc describe nodes master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net
        Name:               master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net
        Roles:              control-plane,master,worker
        Labels:             beta.kubernetes.io/arch=amd64
                            beta.kubernetes.io/os=linux
                            kubernetes.io/arch=amd64
                            kubernetes.io/hostname=master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net
                            kubernetes.io/os=linux
                            node-role.kubernetes.io/control-plane=
                            node-role.kubernetes.io/master=
                            node-role.kubernetes.io/worker=
                            node.openshift.io/os_id=rhcos
                            topology.topolvm.io/node=master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net
        Annotations:        capacity.topolvm.io/00default: 0
                            capacity.topolvm.io/vg1: 17137590599680
                            csi.volume.kubernetes.io/nodeid: {"topolvm.io":"master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net"}
                            k8s.ovn.org/host-addresses: ["10.40.22.126"]
                            k8s.ovn.org/l3-gateway-config:
                              {"default":{"mode":"shared","interface-id":"br-ex_master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net","mac-address":"40:a6:b7:78:43:d0","ip-...
                            k8s.ovn.org/node-chassis-id: 4e0ab6a1-862d-47dc-a700-ac5489d75540
                            k8s.ovn.org/node-gateway-router-lrp-ifaddr: {"ipv4":"100.64.0.2/16"}
                            k8s.ovn.org/node-mgmt-port-mac-address: 4e:ea:8b:15:15:7b
                            k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.40.22.126/30"}
                            k8s.ovn.org/node-subnets: {"default":"172.21.0.0/23"}
                            machineconfiguration.openshift.io/controlPlaneTopology: SingleReplica
                            machineconfiguration.openshift.io/currentConfig: rendered-master-b55e7bf2ecb759242db0e91692a57364
                            machineconfiguration.openshift.io/desiredConfig: rendered-master-b55e7bf2ecb759242db0e91692a57364
                            machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-master-b55e7bf2ecb759242db0e91692a57364
                            machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-master-b55e7bf2ecb759242db0e91692a57364
                            machineconfiguration.openshift.io/reason: 
                            machineconfiguration.openshift.io/ssh: accessed
                            machineconfiguration.openshift.io/state: Done
                            volumes.kubernetes.io/controller-managed-attach-detach: true
        CreationTimestamp:  Tue, 02 Jan 2024 08:45:38 +0100
        Taints:             <none>
        Unschedulable:      false
        Lease:              Failed to get lease: leases.coordination.k8s.io "master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net" not found
        Conditions:
          Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
          ----             ------  -----------------                 ------------------                ------                       -------
          MemoryPressure   False   Tue, 02 Jan 2024 10:07:35 +0100   Tue, 02 Jan 2024 08:45:38 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
          DiskPressure     False   Tue, 02 Jan 2024 10:07:35 +0100   Tue, 02 Jan 2024 08:45:38 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
          PIDPressure      False   Tue, 02 Jan 2024 10:07:35 +0100   Tue, 02 Jan 2024 08:45:38 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
          Ready            True    Tue, 02 Jan 2024 10:07:35 +0100   Tue, 02 Jan 2024 08:54:29 +0100   KubeletReady                 kubelet is posting ready status
        Addresses:
          InternalIP:  10.40.22.126
          Hostname:    master0.vertigo-sno-1-acm.lab.neat.nsn-rdnet.net
        Capacity:
          cpu:                64
          ephemeral-storage:  468315972Ki
          hugepages-1Gi:      21Gi
          hugepages-2Mi:      0
          memory:             263485180Ki
          pods:               250
        Allocatable:
          cpu:                60
          ephemeral-storage:  454504035381
          hugepages-1Gi:      21Gi
          hugepages-2Mi:      0
          memory:             240236284Ki
          pods:               250
         
      1. Install the operators:
      $ omc get csv -A
      NAMESPACE                              NAME                                 DISPLAY                                      VERSION   REPLACES                             PHASE
      multicluster-engine                    multicluster-engine.v2.4.2           multicluster engine for Kubernetes           2.4.2     multicluster-engine.v2.4.1           Succeeded
      open-cluster-management                advanced-cluster-management.v2.9.1   Advanced Cluster Management for Kubernetes   2.9.1     advanced-cluster-management.v2.9.0   Succeeded
      openshift-file-integrity               file-integrity-operator.v1.3.3       File Integrity Operator                      1.3.3                                          Succeeded
      openshift-logging                      cluster-logging.v5.8.1               Red Hat OpenShift Logging                    5.8.1     cluster-logging.v5.8.0               Succeeded
      openshift-operator-lifecycle-manager   packageserver                        Package Server                               0.19.0                                         Succeeded
      openshift-storage                      lvms-operator.v4.12.2                LVM Storage                                  4.12.2    lvms-operator.v4.12.1                Succeeded  
      
      $ omc get pods -n open-cluster-management
      NAME                                                              READY   STATUS             RESTARTS   AGE
      cluster-permission-9d767cf4d-lsd26                                1/1     Running            0          29m
      console-chart-console-v2-5f7b6bb59b-2g472                         1/1     Running            0          29m
      console-chart-console-v2-5f7b6bb59b-d5rn2                         1/1     Running            0          29m
      grc-policy-addon-controller-99765dcfc-5gpld                       1/1     Running            0          29m
      grc-policy-addon-controller-99765dcfc-wbsvg                       1/1     Running            0          29m
      grc-policy-propagator-8488d9d58c-lsn2s                            2/2     Running            0          29m
      grc-policy-propagator-8488d9d58c-nqdsm                            2/2     Running            0          29m
      insights-client-69c669d995-jqhfx                                  1/1     Running            0          29m
      insights-metrics-684855d446-mkmnp                                 2/2     Running            0          29m
      klusterlet-addon-controller-v2-76965b8d84-5rwl2                   1/1     Running            0          29m
      klusterlet-addon-controller-v2-76965b8d84-96rpg                   1/1     Running            0          29m
      multicluster-integrations-75cfdc4c69-bdt8j                        3/3     Running            1          32m
      multicluster-observability-operator-79f5566cc9-b5c9p              1/1     Running            0          29m
      multicluster-operators-application-85cfdfb4f5-qzn7b               3/3     Running            2          32m
      multicluster-operators-channel-5496c6548b-nwxbr                   1/1     Running            1          32m
      multicluster-operators-hub-subscription-68944cc796-5gszd          1/1     Running            1          32m
      multicluster-operators-standalone-subscription-6d85f8f9ff-45trl   1/1     Running            0          32m
      multicluster-operators-subscription-report-84f45d9d4c-2hbxq       1/1     Running            0          32m
      multiclusterhub-operator-7c68c6b9c5-qwgjj                         1/1     Running            0          33m
      search-api-5855458b5d-tt2kx                                       1/1     Running            0          29m
      search-collector-5464f5fb9d-nnckl                                 1/1     Running            0          29m
      search-indexer-57698955d9-cvfbr                                   0/1     CrashLoopBackOff   11         29m
      search-postgres-c4846dc79-rd6bm                                   0/1     CrashLoopBackOff   9          29m
      search-v2-operator-controller-manager-c8dc4dfdf-r6t7f             2/2     Running            0          29m
      submariner-addon-db5c4cc9-2v57k                                   1/1     Running            0          29m
      volsync-addon-controller-5986676985-8vnhz                         1/1     Running            0          29m
      

       

      Actual results:

      Cluster must-gather: https://drive.google.com/drive/folders/1VeTa4kW7ItqiXQRPLKAfuwBhMCATSWLt?usp=sharing 

      Analyzing the logs of the `search-postgres-c4846dc79-rd6bm` :

      $ omc logs -n open-cluster-management search-postgres-c4846dc79-rd6bm -c search-postgres
      2024-01-02T09:07:46.872623275Z The files belonging to this database system will be owned by user "postgres".
      2024-01-02T09:07:46.872623275Z This user must also own the server process.
      2024-01-02T09:07:46.872623275Z 
      2024-01-02T09:07:46.872719820Z The database cluster will be initialized with locale "en_US.utf8".
      2024-01-02T09:07:46.872719820Z The default database encoding has accordingly been set to "UTF8".
      2024-01-02T09:07:46.872719820Z The default text search configuration will be set to "english".
      2024-01-02T09:07:46.872719820Z 
      2024-01-02T09:07:46.872719820Z Data page checksums are disabled.
      2024-01-02T09:07:46.872719820Z 
      2024-01-02T09:07:46.872719820Z fixing permissions on existing directory /var/lib/pgsql/data/userdata ... ok
      2024-01-02T09:07:46.872732106Z creating subdirectories ... ok
      2024-01-02T09:07:46.873182515Z selecting dynamic shared memory implementation ... posix
      2024-01-02T09:07:46.873218345Z selecting default max_connections ... sh: line 1:    22 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:07:48.833791125Z sh: line 1:    24 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=50 -c shared_buffers=500 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:07:49.871619862Z sh: line 1:    26 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=40 -c shared_buffers=400 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:07:50.887143490Z sh: line 1:    28 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=30 -c shared_buffers=300 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:07:51.837179794Z sh: line 1:    30 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:07:51.837382918Z 20
      2024-01-02T09:07:51.837390275Z selecting default shared_buffers ... sh: line 1:    32 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=16384 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:07:53.873834686Z sh: line 1:    34 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=8192 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:07:54.830798901Z sh: line 1:    36 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=4096 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:07:55.848921479Z sh: line 1:    45 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=3584 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:07:56.793309195Z sh: line 1:    47 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=3072 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:07:57.829888466Z sh: line 1:    49 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=2560 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:07:58.784799176Z sh: line 1:    51 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=2048 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:07:59.786337672Z sh: line 1:    53 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=1536 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:08:00.966317691Z sh: line 1:    55 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=1000 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:08:01.910592562Z sh: line 1:    57 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=900 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:08:02.875530817Z sh: line 1:    59 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=800 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:08:03.838026157Z sh: line 1:    61 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=700 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:08:04.892152297Z sh: line 1:    63 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=600 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:08:05.945332685Z sh: line 1:    72 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=500 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:08:06.937940743Z sh: line 1:    74 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=400 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:08:08.028957223Z sh: line 1:    76 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=300 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:08:08.971817245Z sh: line 1:    78 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:08:10.075753837Z sh: line 1:    80 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=100 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:08:11.120469585Z sh: line 1:    82 Bus error               (core dumped) "/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=50 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
      2024-01-02T09:08:11.120662785Z 400kB
      2024-01-02T09:08:11.120675663Z selecting default time zone ... Etc/UTC
      2024-01-02T09:08:11.143223795Z creating configuration files ... ok
      2024-01-02T09:08:11.144227310Z running bootstrap script ... child process was terminated by signal 7: Bus error
      2024-01-02T09:08:12.187073431Z initdb: removing contents of data directory "/var/lib/pgsql/data/userdata" 

      Expected results:

      Expected result would be to have `search-indexer-` and `search-postgres-` pods in Running state and `multiclusterhub` from permanent `Installing` state to `Running` state.

      Additional info:

      Having the hugepages disabled on the SNO hub:

      HugePages_Total:       0
      HugePages_Free:        0
      HugePages_Rsvd:        0
      HugePages_Surp:        0
      Hugepagesize:    1048576 kB
      Hugetlb:               0 kB 

      the `multiclusterhub` progressed to `Running` state:

      NAMESPACE                 NAME              STATUS       AGE
      open-cluster-management   multiclusterhub   Installing   5m54s
      
      NAMESPACE                 NAME              STATUS    AGE
      open-cluster-management   multiclusterhub   Running   6m4s 

       

            jpadilla@redhat.com Jorge Padilla
            midu@redhat.com Mihai IDU
            Xiang Yin Xiang Yin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: