Uploaded image for project: 'Red Hat Service Interconnect (Skupper)'
  1. Red Hat Service Interconnect (Skupper)
  2. SKUPPER-1418

After a Podman site "skupper init" failure the server is left in a difficult to recover state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • 1.5.3
    • None
    • CLI, Container engine sites
    • None

      After a skupper init failure (See Jira SKUPPER-1416) the server leave resources behind that prevent any future skupper init working. Specifically it leaves a crashed router pod and some volumes in place.

      Suggested fix:

      • Skupper should delete any created resources if it fails to start.
      • "Skupper delete" should also delete any stopped containers.

      Issue Details

      Example of earlier failure (SKUPPER-1416) to reproduce the issue:

      $ skupper switch podman 
      $ skupper init --site-name MASTER --platform podman --ingress-host 10.10.10.15
      Error: Error initializing Skupper - error pulling image registry.redhat.io/service-interconnect/skupper-router-rhel9:2.4.3: error reading response body: context deadline exceeded

      After pulling the image manually with podman pull you then try skupper init again:

      $ skupper init --site-name MASTER --platform podman --ingress-host 10.10.10.15
      Error: Error initializing Skupper - skupper-router container already defined (See ISSUE #3)

      Try and clean up resources using skupper delete

      $ skupper delete
      Skupper is not enabled for user 'bryon'
      [bryon@rhel8-svr1 ~]$ skupper init --site-name MASTER --platform podman --ingress-host 10.10.10.15
      Error: Error initializing Skupper - skupper-router container already defined

      $ podman ps -a
      CONTAINER ID  IMAGE                                                               COMMAND               CREATED        STATUS      PORTS                                               NAMES
      8eced922cefc  registry.redhat.io/service-interconnect/skupper-router-rhel9:2.4.3  /home/skrouterd/b...  5 minutes ago  Created     0.0.0.0:45671->45671/tcp, 0.0.0.0:55671->55671/tcp  skupper-router

      $ podman system prune
      WARNING! This command removes:
        - all stopped containers
        - all networks not used by at least one container
        - all dangling images
        - all dangling build cache

      Are you sure you want to continue? [y/N] y
      Total reclaimed space: 0B

      Skupper init still fails due to stray volumes

      $ skupper init --site-name MASTER --platform podman --ingress-host 10.10.10.15
      Error: Error initializing Skupper - required volume already exists skupper-local-server

      Delete the volumes

      $ podman volume rm --all
      skupper-internal
      skupper-local-server
      skupper-router-certs
      skupper-site-server

      Skupper now starts okay

      $ skupper init --site-name MASTER --platform podman --ingress-host 10.10.10.15
      It is recommended to enable lingering for bryon, otherwise Skupper may not start on boot.
      Skupper is now installed for user 'bryon'.  Use 'skupper status' to get more information.
      [bryon@rhel8-svr1 ~]$ podman ps
      CONTAINER ID  IMAGE                                                               COMMAND               CREATED         STATUS         PORTS                                               NAMES
      2b950c6bf600  registry.redhat.io/service-interconnect/skupper-router-rhel9:2.4.3  /home/skrouterd/b...  25 seconds ago  Up 24 seconds  0.0.0.0:45671->45671/tcp, 0.0.0.0:55671->55671/tcp  skupper-router

       

       

            fgiorget@redhat.com Fernando Giorgetti
            rhn-sa-brbaker Bryon Baker
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: