Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36826

Intermittent panic in machine-config-daemon-firstboot service when creating a node using a 4.1 boot image

XMLWordPrintable

    • Low
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When we scale up a new machineset to create a new node using a 4.1 boot image, the machine-config-daemon-firstboot intermittently report a panic.
      
      The panic does not avoid the node to join the cluster.
      
          

      Version-Release number of selected component (if applicable):

      IPI on AWS version:
          
      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.16.0-0.nightly-2024-07-10-022831   True        False         5h50m   Cluster version is 4.16.0-0.nightly-2024-07-10-022831
      
          

      How reproducible:

      Intermittent
          

      Steps to Reproduce:

      1. Create a machineset using a 4.1 cloud image
      2. Scale the machineset to create a new worker node
      3. When the worker node is added, check the machine-config-daemon-firstboot service
          

      Actual results:

      Intermittently, machine-config-daemon-firstboot service will report a panic like this one:
      
      
      
      Jul 10 10:26:44 ip-10-0-15-193 podman[1435]: I0710 10:26:44.601164    1472 update.go:2618] Running: systemd-run --unit machine-config-daemon-update-rpmostree-via-container -p EnvironmentFile=-/etc/mco/proxy.env --collect --wait -- podman run --env-file /etc/mco/proxy.env --privileged --pid=host --net=host --rm -v /:/run/host quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:77d490c385a99006dfa39460a2266b88a897572177fed886cdab1c3a1447f3ef rpm-ostree ex deploy-from-self /run/host
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: I0710 10:27:14.765820    1472 update.go:2618] Running: setenforce 1
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: time="2024-07-10T10:27:14Z" level=error msg="Error forwarding signal 15 to container f7493cc548e12405d132c677905a0bce68e1cc2377f2e28734e635d637479816: container has already been removed"
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: time="2024-07-10T10:27:14Z" level=error msg="Error forwarding signal 18 to container f7493cc548e12405d132c677905a0bce68e1cc2377f2e28734e635d637479816: container has already been removed"
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: panic: close of closed channel
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: goroutine 61 [running]:
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: panic(0x55d72326c8c0, 0x55d7233d96c0)
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]:         /usr/lib/golang/src/runtime/panic.go:556 +0x2cf fp=0xc000236ec0 sp=0xc000236e30 pc=0x55d721ebcfdf
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: runtime.closechan(0xc0002e80c0)
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]:         /usr/lib/golang/src/runtime/chan.go:335 +0x260 fp=0xc000236f10 sp=0xc000236ec0 pc=0x55d721e976f0
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: github.com/containers/libpod/vendor/github.com/docker/docker/pkg/signal.StopCatch(0xc0002e80c0)
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]:         /builddir/build/BUILD/libpod-96ccc2edf597a191fe03eff98b2905788a26553f/_build/src/github.com/containers/libpod/vendor/github.com/docker/docker/pkg/signal/signal.go:26 +0x3b fp=0xc000236f28 sp=0xc000236f10 pc=0x55d7227310fb
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: main.ProxySignals.func1(0xc0002e80c0, 0xc0001f1bc0)
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]:         /builddir/build/BUILD/libpod-96ccc2edf597a191fe03eff98b2905788a26553f/_build/src/github.com/containers/libpod/cmd/podman/sigproxy.go:28 +0x1fa fp=0xc000236fd0 sp=0xc000236f28 pc=0x55d722b4affa
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: runtime.goexit()
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]:         /usr/lib/golang/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc000236fd8 sp=0xc000236fd0 pc=0x55d721eeb871
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: created by main.ProxySignals
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]:         /builddir/build/BUILD/libpod-96ccc2edf597a191fe03eff98b2905788a26553f/_build/src/github.com/containers/libpod/cmd/podman/sigproxy.go:18 +0xa5
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: goroutine 1 [syscall]:
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: syscall.Syscall(0xa6, 0xc000205a40, 0x0, 0x0, 0x0, 0x55d7220024ad, 0xc0002ee200)
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]:         /usr/lib/golang/src/syscall/asm_linux_amd64.s:18 +0x5 fp=0xc000174d38 sp=0xc000174d30 pc=0x55d721f05985
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: github.com/containers/libpod/vendor/golang.org/x/sys/unix.Unmount(0xc0003144b0, 0x23, 0x0, 0x55d72200197c, 0xc000205980)
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]:         /builddir/build/BUILD/libpod-96ccc2edf597a191fe03eff98b2905788a26553f/_build/src/github.com/containers/libpod/vendor/golang.org/x/sys/unix/zsyscall_linux_amd64.go:1299 +0x8c fp=0xc000174d98 sp=0xc000174d38 pc=0x55d721ffbd9c
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: github.com/containers/libpod/vendor/github.com/containers/storage/pkg/mount.unmount(0xc0003144b0, 0x23, 0x0, 0x22, 0x21)
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]:         /builddir/build/BUILD/libpod-96ccc2edf597a191fe03eff98b2905788a26553f/_build/src/github.com/containers/libpod/vendor/github.com/containers/storage/pkg/mount/mounter_linux.go:56 +0x41 fp=0xc000174dd0 sp=0xc000174d98 pc=0x55d722002261
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]: github.com/containers/libpod/vendor/github.com/containers/storage/pkg/mount.ForceUnmount(0xc0003144b0, 0x23, 0x1, 0x0)
      Jul 10 10:27:14 ip-10-0-15-193 podman[1435]:         /builddir/build/BUILD/libpod-96ccc2edf597a191fe03eff98b2905788a26553f/_build/src/github.com/containers/libpod/vendor/github.com/containers/storage/pkg/mount/mount.go:100 +0x64 fp=0xc000174e10 sp=0xc000174dd0 pc=0x55d722001fb4
      
          

      Expected results:

      No panic should happen
          

      Additional info:

      The panic does not break the functionality, the node is rebooted and it can join without problems to the cluster.
          

              team-mco Team MCO
              sregidor@redhat.com Sergio Regidor de la Rosa
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: