Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-16802

openstack-operator-controller-manager in CrashLoopBackOff for oom-kill

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • rhos-18.0.9
    • rhos-18.0.7
    • openstack-operator
    • None
    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • openstack-operator-bundle-container-1.0.11-8
    • rhos-conplat-core-operators
    • None
    • OSPK8S Sprint 1
    • 1
    • Important

      To Reproduce Steps to reproduce the behavior:

      1.  start RHOSO deploy with the OpenStack operator version 1.0.9.
      2. After deploying the OpenStackControlPlane resource, the openstack-operator-controller-manager pod in openstack-operators namespace is going into CrashLoopBackOff . 
      3. This is blocking any kind of updates to the OpenStackControlPlane resource.

      Device Info (please complete the following information):

      •  OpenShift 4.16.37
      • openstack-operator: 1.0.9
      • RHOSO Version:  18.0.7-20250408.2
      • OS Version: Red Hat Enterprise Linux CoreOS 416.94.202502260030-0 416.94.202502260030-0

      Bug impact

      • Deploy is blocked because operator is in crash loop

      Known workaround

      Additional context

      • $ oc get pods -owide
        NAME                                                              READY   STATUS             RESTARTS   AGE   IP             NODE                         NOMINATED NODE   READINESS GATES
        openstack-operator-controller-manager-7ff8868cb4-kjxv6            1/2     CrashLoopBackOff   8          20h   10.x.0.23    ocp2-master0.x.com   <none>           <none>
      • ./0020-inspect.local.450548990925341877.openstack-operators.tar.gz/inspect.local.450548990925341877/namespaces/openstack-operators/pods/openstack-operator-controller-manager-7ff8868cb4-kjxv6/openstack-operator-controller-manager-7ff8868cb4-kjxv6.yaml
          containerStatuses:
          - containerID: cri-o://efbb862bae00057ac49b38ff17116d0e02a44db1ebdd0a284bd0831cc76a3889
            image: registry.redhat.io/rhoso-operators/openstack-rhel9-operator@sha256:f3620ecac70139e08f92e6a9444e1edd603cf7d1efa64dda19986defab87241b
            imageID: registry.redhat.io/rhoso-operators/openstack-rhel9-operator@sha256:a002ef12a5077ba2301b67a74bf6dedf1c80048241adb1eaffb4385aad5dddd8
            lastState:
              terminated:
                containerID: cri-o://efbb862bae00057ac49b38ff17116d0e02a44db1ebdd0a284bd0831cc76a3889
                exitCode: 137
                finishedAt: "2025-05-15T07:11:24Z"
                reason: Error
                startedAt: "2025-05-15T07:11:05Z"
            name: manager
            ready: false
            restartCount: 8
            started: false
            state:
              waiting:
                message: back-off 5m0s restarting failed container=manager pod=openstack-operator-controller-manager-7ff8868cb4-kjxv6_openstack-operators(66a28202-2d6d-478a-a857-f2f7cb7fd582)
                reason: CrashLoopBackOff
      • 3734860,1     99%
        May 19 13:26:05 ocp2-master0.x.com kernel: Hardware name: Dell Inc. PowerEdge R650/0TCW38, BIOS 1.13.2 12/19/2023
        May 19 13:26:05 ocp2-master0.x.com kernel: Call Trace:
        May 19 13:26:05 ocp2-master0.x.com kernel:  <TASK>
        May 19 13:26:05 ocp2-master0.x.com kernel:  dump_stack_lvl+0x34/0x48
        May 19 13:26:05 ocp2-master0.x.com kernel:  dump_header+0x4a/0x201
        May 19 13:26:05 ocp2-master0.x.com kernel:  oom_kill_process.cold+0xb/0x10
        May 19 13:26:05 ocp2-master0.x.com kernel:  out_of_memory+0xed/0x2e0
        May 19 13:26:05 ocp2-master0.x.com kernel:  mem_cgroup_out_of_memory+0x131/0x150
        May 19 13:26:05 ocp2-master0.x.com kernel:  try_charge_memcg+0x763/0x820
        May 19 13:26:05 ocp2-master0.x.com kernel:  charge_memcg+0x32/0xa0
        May 19 13:26:05 ocp2-master0.x.com kernel:  __mem_cgroup_charge+0x29/0x80
        May 19 13:26:05 ocp2-master0.x.com kernel:  do_anonymous_page+0x85/0x4c0
        May 19 13:26:05 ocp2-master0.x.com kernel:  __handle_mm_fault+0x32b/0x670
        May 19 13:26:05 ocp2-master0.x.com kernel:  ? set_next_entity+0xda/0x150
        May 19 13:26:05 ocp2-master0.x.com kernel:  handle_mm_fault+0xcd/0x290
        May 19 13:26:05 ocp2-master0.x.com kernel:  do_user_addr_fault+0x1b4/0x6a0
        May 19 13:26:05 ocp2-master0.x.com kernel:  exc_page_fault+0x62/0x150
        May 19 13:26:05 ocp2-master0.x.com kernel:  asm_exc_page_fault+0x22/0x30
        May 19 13:26:05 ocp2-master0.x.com kernel: RIP: 0033:0x475119
        May 19 13:26:05 ocp2-master0.x.com kernel: Code: fe 7f 44 1f 80 c5 f8 77 c3 80 3d 84 83 13 03 01 75 0d c5 f9 ef c0 48 81 fb 00 00 00 02 73 13 48 89 d9 48 c1 e9 03 48 83 e3 07 <f3> 48 ab e9 65 fe ff ff c5 fe 7f 07 48 89 fe 48 83 c7 20 48 83 e7
        May 19 13:26:05 ocp2-master0.x.com kernel: RSP: 002b:000000c006ad6518 EFLAGS: 00010246
        May 19 13:26:05 ocp2-master0.x.com kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000003f0
        May 19 13:26:05 ocp2-master0.x.com kernel: RDX: 00007f356e69e5b8 RSI: 0000000000000000 RDI: 000000c007aa2000
        May 19 13:26:05 ocp2-master0.x.com kernel: RBP: 000000c006ad6578 R08: 0000000000000000 R09: 0000000000004a80
        May 19 13:26:05 ocp2-master0.x.com kernel: R10: 00007f34ec26a350 R11: 000000c007a9f500 R12: 0000000000000001
        May 19 13:26:05 ocp2-master0.x.com kernel: R13: 0000000000000003 R14: 000000c0009fbba0 R15: 0000000000000002
        May 19 13:26:05 ocp2-master0.x.com kernel:  </TASK>
        May 19 13:26:05 ocp2-master0.x.com kernel: memory: usage 262144kB, limit 262144kB, failcnt 154
        May 19 13:26:05 ocp2-master0.x.com kernel: swap: usage 0kB, limit 0kB, failcnt 0
        May 19 13:26:05 ocp2-master0.x.com kernel: Tasks state (memory values in pages):
        May 19 13:26:05 ocp2-master0.x.com kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
        May 19 13:26:05 ocp2-master0.x.com kernel: [2320610] 65532 2320610  1515774    70611  1257472        0           999 manager
        May 19 13:26:05 ocp2-master0.x.com kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-1ec3fe9dc2550f16c1231a3b58e2e2e19e4fa8ab421ec4185bb060a2377c2ce8.scope,mems_allowed=0-1,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4d395444_e44c_46ba_9c07_ba6c56b6775a.slice/crio-1ec3fe9dc2550f16c1231a3b58e2e2e19e4fa8ab421ec4185bb060a2377c2ce8.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4d395444_e44c_46ba_9c07_ba6c56b6775a.slice/crio-1ec3fe9dc2550f16c1231a3b58e2e2e19e4fa8ab421ec4185bb060a2377c2ce8.scope,task=manager,pid=2320610,uid=65532
        May 19 13:26:05 ocp2-master0.x.com kernel: Memory cgroup out of memory: Killed process 2320610 (manager) total-vm:6063096kB, anon-rss:245580kB, file-rss:36864kB, shmem-rss:0kB, UID:65532 pgtables:1228kB oom_score_adj:999
        May 19 13:26:05 ocp2-master0.x.com kernel: Tasks in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4d395444_e44c_46ba_9c07_ba6c56b6775a.slice/crio-1ec3fe9dc2550f16c1231a3b58e2e2e19e4fa8ab421ec4185bb060a2377c2ce8.scope are going to be killed due to memory.oom.group set
        May 19 13:26:05 ocp2-master0.x.com kernel: Memory cgroup out of memory: Killed process 2320638 (manager) total-vm:6063096kB, anon-rss:245580kB, file-rss:36864kB, shmem-rss:0kB, UID:65532 pgtables:1228kB oom_score_adj:999
        May 19 13:26:05 ocp2-master0.x.com systemd[1]: crio-1ec3fe9dc2550f16c1231a3b58e2e2e19e4fa8ab421ec4185bb060a2377c2ce8.scope: A process of this unit has been killed by the OOM killer.

              rhn-support-mschuppe Martin Schuppert
              rhn-support-ltamagno Luigi Dino Tamagnone
              rhos-dfg-ospk8s
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: