Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-22356 Registering host through load balancer causes REX not to know what capsule to choose for 'registered_through'
  3. SAT-22927

[QE] Registering host through load balancer causes REX not to know what capsule to choose for 'registered_through'

XMLWordPrintable

    • Icon: Sub-task Sub-task
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • Capsule - Content
    • Sprint 127

      +++ This bug was initially created as a clone of Bug #2240956 +++

      Description of problem:
      From customer case with failing Remote execution that started after upgrade to 6.13. Troubleshooting resulted in finding that host's Content source gets set to null when re-registering. That resulted in Remote execution not working with error message:
      Failed to initialize: RuntimeError - Could not use any Capsule for the ["SSH", "Script"] job. Consider configuring remote_execution_global_proxy, remote_execution_fallback_proxy in settings

      Due to network segmentation, setting anything other than:
      Fallback to Any Capsule: Yes
      Enable Global Capsule: No
      will fail for them.

      Version-Release number of selected component (if applicable):
      [root@jb-rhel8-sat613 ~]# rpm -qa | grep satellite
      satellite-installer-6.13.0.7-1.el8sat.noarch
      satellite-common-6.13.4-1.el8sat.noarch
      ansible-collection-redhat-satellite-3.9.0-2.el8sat.noarch
      rubygem-foreman_theme_satellite-11.0.0.5-1.el8sat.noarch
      satellite-6.13.4-1.el8sat.noarch
      ansible-collection-redhat-satellite_operations-1.3.0-2.el8sat.noarch
      satellite-maintain-0.0.1-1.el8sat.noarch
      satellite-cli-6.13.4-1.el8sat.noarch

      How reproducible:
      Always

      Steps to Reproduce:
      Satellite server: jb-rhel8-sat613.jb.lab
      Client: jb-rhel8-test01.jb.lab
      Capsules: jb-rhel8-sat613-caps01.jb.lab, jb-rhel8-sat613-caps02.jb.lab
      Load balancer: jb-haproxy01.jb.lab

      1. Setup satellite and capsules with load balancing according to latest documentation.

      2. Register or re-register a host with --serverurl=<loadbalancer-address or ip-address for capsule> sets Content source to null.
      [root@jb-rhel8-test01 ~]# subscription-manager register --serverurl=https://jb-haproxy01.jb.lab --force

      3. Register or re-register a host with --serverurl=<capsule hostname> sets the correct Content source.
      [root@jb-rhel8-test01 ~]# subscription-manager register --serverurl=https://jb-rhel8-sat613-caps02.jb.lab --force

      Actual results:
      [root@jb-rhel8-sat613 ~]# hammer host info --name jb-rhel8-test01.jb.lab | grep -A2 'Content Source'
      Content Source:
      Id:
      Name:

      Expected results:
      [root@jb-rhel8-sat613 ~]# hammer host info --name jb-rhel8-test01.jb.lab | grep -A2 'Content Source'
      Content Source:
      Id: 2
      Name: jb-rhel8-sat613-caps01.jb.lab

      Additional info:
      Debug logs attached

      When reinstalling katello-ca-consumer-latest.rpm and using registering with no --serverurl, content source is
      properly set from value in package.

      — Additional comment from on 2023-09-27T10:40:27Z

      Created attachment 1990798
      Debug when registering directly to capsule

      — Additional comment from on 2023-09-27T10:40:58Z

      Created attachment 1990799
      Debug when registering through load balancer

      — Additional comment from on 2023-10-03T11:35:53Z

      Are you also getting this error when re-registering the host using the global registration?

      — Additional comment from on 2023-10-03T12:06:49Z

      @nalfassi Yes, the behavior is the same when re-registering with global registration.

      — Additional comment from on 2023-10-30T11:29:29Z

      Bulk setting Target Milestone = 6.15.0 where sat-6.15.0+ is set.

      — Additional comment from on 2023-11-10T10:27:36Z

      Customer reached out asking about an update.
      What can I tell them to set the right expectations?
      Expected to be resolved in 6.14, 6.15?

      — Additional comment from on 2023-11-13T13:30:12Z

      This looks like a problem in Katello's RHSM endpoints & handling registration data,
      but I might be wrong, sorry in advance if I moved it to the wrong category.

      — Additional comment from on 2023-11-16T17:50:29Z

      Created redmine issue https://projects.theforeman.org/issues/36928 from this bug

      — Additional comment from on 2023-11-16T18:21:23Z

      Turns out the issue is not the null content source; instead it's the fact that the host's 'registered_through' value doesn't match any known capsule name, so Remote Execution doesn't know what capsule to use for the host. I'll update the title a bit.

      — Additional comment from on 2023-12-07T16:35:56Z

      Hi @jbjornel@redhat.com

      Are you able to try the upstream patch with the customer and see if it resolves the issue?

      https://github.com/Katello/katello/pull/10804

      (You will need to re-register the affected hosts after applying the patch.)

      — Additional comment from on 2023-12-08T07:54:39Z

      I tested this with a 6.15 LB setup

      Sat: vm205-223.uatlab.pnq2.redhat.com

      Cap1: vm207-129.uatlab.pnq2.redhat.com
      Cap2: vm205-222.uatlab.pnq2.redhat.com

      haproxy\lb: vm205-217.uatlab.pnq2.redhat.com

      rh7-client: dhcp130-248.gsslab.pnq2.redhat.com
      rh8-client: dhcp130-245.gsslab.pnq2.redhat.com

      Steps:

      • Installed the whole setup as expected
      • Configured the capsules in this way afterward:

      On each capsule:

      satellite-installer --scenario capsule \
      --enable-foreman-proxy-plugin-ansible \
      --foreman-proxy-registration true \
      --foreman-proxy-templates true \
      --foreman-proxy-registration-url "https://vm205-217.uatlab.pnq2.redhat.com:9090" \
      --foreman-proxy-template-url "https://vm205-217.uatlab.pnq2.redhat.com:8000"

      On satellite:

      for i in `hammer --csv --no-headers capsule list --fields Id`; do hammer capsule refresh-features --id $i; sleep 3; done

      • Synced satellite-client-6 repo on satellite and capsules
      • Created an AK
      • Generated a registration command from satellite by selecting an external capsule: [ It generates with LB FQDN as expected ]

      curl -sS --insecure 'https://vm205-217.uatlab.pnq2.redhat.com:9090/register?activation_keys=TEST_AK&force=true&location_id=2&organization_id=1&setup_insights=false&update_packages=false' -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo0LCJpYXQiOjE3MDIwMTkxMjAsImp0aSI6IjI5ZGRhMGQ0MmM1YzEyMTdjNTZiNzRkNzdmZmI0NjM1OWE4MDIyNmNmMmY3ZWE3Zjc4ODc0NGVjYjhjNDYwN2MiLCJzY29wZSI6InJlZ2lzdHJhdGlvbiNnbG9iYWwgcmVnaXN0cmF0aW9uI2hvc3QifQ.aWL7x0OfWpjmMF0NBSGdw0xF7cjEJMwMKLdOtwGjc4s' | bash

      • Registered both the systems with satellite. And this is the status from satellite w.r.t content source and registered_through capsules:
        1. RECORD 3 and 4
      1. su - postgres -c 'psql -x foreman -c "select h.id,h.name,o.title,k.registered_at,k.last_checkin,k.registered_through from hosts as h left join katello_subscription_facets k on k.host_id = h.id left join operatingsystems o on o.id = h.operatingsystem_id order by last_checkin;"'
        [ RECORD 1 ]----+----------------------------------
        id | 3
        name | vm205-222.uatlab.pnq2.redhat.com
        title | RHEL 8.9
        registered_at | 2023-12-07 18:44:30
        last_checkin | 2023-12-08 06:53:14.651602
        registered_through | vm205-223.uatlab.pnq2.redhat.com
        [ RECORD 2 ]----+----------------------------------
        id | 2
        name | vm207-129.uatlab.pnq2.redhat.com
        title | RHEL 8.9
        registered_at | 2023-12-07 18:36:54
        last_checkin | 2023-12-08 06:59:52.594299
        registered_through | vm205-223.uatlab.pnq2.redhat.com
        [ RECORD 3 ]----+----------------------------------
        id | 4
        name | dhcp130-248.gsslab.pnq2.redhat.com
        title | RedHat 7.9
        registered_at | 2023-12-08 07:08:03
        last_checkin | 2023-12-08 07:08:09.112907
        registered_through | vm205-217.uatlab.pnq2.redhat.com
        [ RECORD 4 ]----+----------------------------------
        id | 5
        name | dhcp130-245.gsslab.pnq2.redhat.com
        title | RedHat 8.8
        registered_at | 2023-12-08 07:08:09
        last_checkin | 2023-12-08 07:08:16.028888
        registered_through | vm205-217.uatlab.pnq2.redhat.com
        [ RECORD 5 ]----+----------------------------------
        id | 1
        name | vm205-223.uatlab.pnq2.redhat.com
        title | RHEL 8.9
        registered_at |
        last_checkin |
        registered_through |

      [root@vm205-223 ~]# hammer host info --name dhcp130-248.gsslab.pnq2.redhat.com --fields 'Content information/content source/name'
      Content Information:
      Content Source:
      Name:

      [root@vm205-223 ~]# hammer host info --name dhcp130-245.gsslab.pnq2.redhat.com --fields 'Content information/content source/name'
      Content Information:
      Content Source:
      Name:

      • In Satellite Settings:
      1. hammer --csv --csv-separator="|" --no-headers settings list | egrep "global_proxy|fallback|registered_through" | column -s"|" -t
        remote_execution_fallback_proxy Fallback to Any Proxy true Search the host for any proxy with Remote Execution, useful when the host has no subnet or the subnet does not have an execution proxy
        remote_execution_global_proxy Enable Global Proxy false Search for remote execution proxy outside of the proxies assigned to the host. The search will be limited to the host's organization and location.
        remote_execution_prefer_registered_through_proxy Prefer registered through proxy for remote execution true Prefer using a proxy to which a host is registered when using remote execution

      Before using the patch:

      Remote execution as well as ansible role execution fails on both the systems with error

      Could not use any Capsule for the ["SSH", "Script"] job. Consider configuring remote_execution_global_proxy, remote_execution_fallback_proxy in settings (RuntimeError)
      /usr/share/gems/gems/foreman_remote_execution-11.1.1/app/lib/actions/remote_execution/run_host_job.rb:259:in `determine_proxy!'
      /usr/share/gems/gems/foreman_remote_execution-11.1.1/app/lib/actions/remote_execution/run_host_job.rb:50:in `inner_plan'
      /usr/share/gems/gems/foreman_remote_execution-11.1.1/app/lib/actions/remote_execution/run_host_job.rb:24:in `block in plan'
      /usr/share/gems/gems/foreman_remote_execution-11.1.1/app/lib/actions/remote_execution/template_invocation_progress_logging.rb:20:in `block in with_template_invocation_error_logging'
      /usr/share/gems/gems/foreman_remote_execution-11.1.1/app/lib/actions/remote_execution/template_invocation_progress_logging.rb:20:in `catch'
      /usr/share/gems/gems/foreman_remote_execution-11.1.1/app/lib/actions/remote_execution/template_invocation_progress_logging.rb:20:in `with_template_invocation_error_logging'
      /usr/share/gems/gems/foreman_remote_execution-11.1.1/app/lib/actions/remote_execution/run_host_job.rb:23:in `plan'
      /usr/share/gems/gems/dynflow-1.8.2/lib/dynflow/action.rb:532:in `block (3 levels) in execute_plan'

      After sanitizing the patch ( removal of the stuff related to test/models/concerns/host_managed_extensions_test.rb ) and applying it on satellite + restart of services :

        • Content Source is still blank [ as expected ] and I haven't re-registered the systems yet.
        • Surprisingly, REX works fine on both systems and so as the ansible roles. There are no errors reported anywhere.
        • When both capsules are active, haproxy uses capsule1 to run the job on rhel7 client and capsule2 on rhel8
        • If i stop httpd and foreman-proxy service on capsule1, haproxy uses capsule2 to run job on both systems.
        • If i stop the same services on capsule2 and turn them back on capsule1, haproxy uses capsule1 instead for both systems.

      The Content Source is still blank for the hosts but I think that is not causing any problems at all, So no need to re-register any systems.

      – here the main BZ testing ends –

      Now, Just to check the content-source part, I have re-registered the clients with the same exact curl command [ while the satellite is in patched state ].

      registered_through looks good:

      [ RECORD 3 ]----+----------------------------------
      id | 5
      name | dhcp130-245.gsslab.pnq2.redhat.com
      title | RedHat 8.8
      registered_at | 2023-12-08 07:49:41
      last_checkin | 2023-12-08 07:49:47.587887
      registered_through | vm205-217.uatlab.pnq2.redhat.com
      [ RECORD 4 ]----+----------------------------------
      id | 4
      name | dhcp130-248.gsslab.pnq2.redhat.com
      title | RedHat 7.9
      registered_at | 2023-12-08 07:49:42
      last_checkin | 2023-12-08 07:49:48.794278
      registered_through | vm205-217.uatlab.pnq2.redhat.com

      Content source has been populated to one of the capsules:

      1. hammer host info --name dhcp130-248.gsslab.pnq2.redhat.com --fields 'Content information/content source/name'
        Content Information:
        Content Source:
        Name: vm205-222.uatlab.pnq2.redhat.com
      1. hammer host info --name dhcp130-245.gsslab.pnq2.redhat.com --fields 'Content information/content source/name'
        Content Information:
        Content Source:
        Name: vm205-222.uatlab.pnq2.redhat.com

      Most importantly, REX and Ansible roles continue to work fine via LB+Capsules as they were doing before re-registration ..

      Conclusion: The patch works as expected and the re-registration of client hosts are only needed if someone is just worried about content source, but That makes no difference in the outcome.

      — Additional comment from on 2023-12-18T22:00:56Z

      Hi team,

      I'm not able to see this patch (https://patch-diff.githubusercontent.com/raw/Katello/katello/pull/10804.diff) included in snap_6.15.0_3.0.

      — Additional comment from on 2023-12-18T22:32:09Z

      Actually, the katello version shipped in snap_6.15.0_3.0 is katello-4.11.0-0.2.rc2.el8sat.noarch but the OForeman issue is marked for "Katello 4.12.0".

      — Additional comment from on 2023-12-19T14:15:44Z

      Moved by mistake, sorry about that.

      QE Tracker for https://issues.redhat.com/browse/SAT-22356
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2257331
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2240956

              vsedmik@redhat.com Vladimír Sedmík
              satellite-focaccia-bot Focaccia Bot
              Vladimír Sedmík Vladimír Sedmík
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: