Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-40229

Starting logrotate & foreman-proxy altogether can lead to failed foreman-proxy.service

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 6.17.0
    • Foreman Proxy
    • False
    • Moderate
    • Satellite Endeavour Refinement
    • sat-endeavour
    • None
    • None
    • None
    • None

      Description of problem:
      User scenario:

      • Satellite (or Capsule) is switched off for some time, including a regular time of log rotation
      • Starting the system up, `foreman-proxy` service fails after receiving SIGUSR1 signal from logrotate before the `smart-proxy` process sets proper trap for catching the signal.
      • `foreman-proxy` is down until a user intervention
         

      How reproducible:
      Rarely (in house) in the above scenario, 100% with bit artificial reproducer below.

      Is this issue a regression from an earlier version:
      No.
       

      Steps to Reproduce:
      The above generic scenario is tricky to reproduce, but it can be narrowed down to 95%-precise, 100% reliable reproducer:

      systemctl restart foreman-proxy & sleep 0.9; logrotate -f /etc/logrotate.d/foreman-proxy
      

      (the "sleep 0.9" might vary among systems, let tune it for you)

      This leads to the service start failure every time, though this reproducer is not fully true to the user scenario as it does not mimic system start up (including logrotate startup) precisely.

      Actual behavior:
      foreman-proxy service fails to start and is stopped forever.

      Customer scenario during system startup:

      Nov 06 09:24:27 satellite systemd[1]: Starting Rotate log files...
      Nov 06 09:24:29 satellite logrotate[1075]: Stopping logging:
      Nov 06 09:24:30 satellite logrotate[1074]: Redirecting start to /bin/systemctl start auditd.service
      Nov 06 09:24:32 satellite systemd[1]: Starting Foreman Proxy...
      Nov 06 09:24:36 satellite systemd[1]: foreman-proxy.service: Sent signal SIGUSR1 to main process 1181 (smart-proxy) on client request.
      Nov 06 09:24:36 satellite systemd[1]: foreman-proxy.service: Main process exited, code=killed, status=10/USR1
      Nov 06 09:24:36 satellite systemd[1]: foreman-proxy.service: Failed with result 'signal'.
      Nov 06 09:24:36 satellite systemd[1]: Failed to start Foreman Proxy.
      Nov 06 09:24:44 satellite systemd[1]: logrotate.service: Deactivated successfully.
      Nov 06 09:24:44 satellite systemd[1]: Finished Rotate log files.
      

      My own logs from the "simplified scenario":

      Nov 12 12:33:02 satellite systemd[1]: Stopping Foreman Proxy...
      Nov 12 12:33:02 satellite systemd[1]: foreman-proxy.service: Deactivated successfully.
      Nov 12 12:33:02 satellite systemd[1]: Stopped Foreman Proxy.
      Nov 12 12:33:02 satellite systemd[1]: foreman-proxy.service: Consumed 1.375s CPU time. 
      Nov 12 12:33:02 satellite systemd[1]: Starting Foreman Proxy...
      Nov 12 12:33:03 satellite systemd[1]: foreman-proxy.service: Sent signal SIGUSR1 to main process 9750 (smart-proxy) on client request. 
      Nov 12 12:33:03 satellite systemd[1]: foreman-proxy.service: Main process exited, code=killed, status=10/USR1
      Nov 12 12:33:03 satellite systemd[1]: foreman-proxy.service: Failed with result 'signal'.
      Nov 12 12:33:03 satellite systemd[1]: Failed to start Foreman Proxy.
      

      Expected behavior:
      The service is running at the end.

      Business Impact / Additional info:
      It can be tricky to fix the code. We can move the routine to catch the signal (`install_usr1_trap` method in `lib/proxy/signal_handler.rb`) as soon as possible, but there will be always a window of opportunity to hit this bug.

      We shall probably rather modify the systemd service to behave in a more clever way. Here two workarounds are possible:

      1) Wait nutil logrotate service isnt gone: By having `/etc/systemd/system/foreman-proxy.service.d/91-logrotate-wait.conf` :

      [Unit]
      Wants=logrotate.service
      After=logrotate.service
      

      (this idea does not prebvent the "simplified reproducer" that mimics logrotate restart by other means. But this idea is sufficient for the real use case, so..)

      2) Add restarts of failed startups: add

      Restart=on-failure
      RestartSec=5
      

      to `[Service]` section of `/usr/lib/systemd/system/foreman-proxy.service`.

              Unassigned Unassigned
              rhn-support-pmoravec Pavel Moravec
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: