Uploaded image for project: 'Subscription Watch'
  1. Subscription Watch
  2. SWATCH-4469

Fix flaky test for guest-to-hypervisor mapping updates

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • None
    • swatch-metrics-hbi
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • subs-swatch-thunder

      This ticket tracks a flaky test where a guest-to-hypervisor mapping update sometimes fails to produce the expected outbox record. The test component/swatch_metrics_hbi/test_hbi_created_updated_events.py::test_unmapped_guest_to_mapped_guest_transition[updated-INSTANCE_UPDATED] intermittently times out while waiting for an INSTANCE_UPDATED outbox record that matches the guest inventory ID and timestamp. This causes retries to exhaust and the test to fail even though the flow should be valid.

      Traces:

      func = <bound method SwatchMetricsHbiDBClient.get_outbox_record of <iqe_rhsm_subscriptions.tests.component.swatch_metrics_hbi.db.db_client.SwatchMetricsHbiDBClient object at 0x7f8f6be5a5d0>>
      validator = <function SwatchMetricsHbiTestHelper.wait_for_outbox_record.<locals>.<lambda> at 0x7f8f69bf5440>
      retries = 10, delay_seconds = 0.5
      args = ('INSTANCE_UPDATED', '3340851', 'c5b63f24-68bf-42b9-9ae8-b31de0a76459', '2026-01-15T07:00:00Z')
      kwargs = {}
      r = <Retrying object at 0x7f8f6bd978f0 (stop=<tenacity.stop.stop_after_attempt object at 0x7f8f6bd965a0>, wait=<tenacity.w...0x7f8f6bd94e60>, before=<function before_nothing at 0x7f8f72895a80>, after=<function after_nothing at 0x7f8f72894400>)>
      last = None
      
          def run_until_valid(
              func: Callable[..., T],
              *args: Any,
              validator: Callable[[T], bool] | None = None,
              retries: int = 10,
              delay_seconds: float = 0.5,
              **kwargs: Any,
          ) -> T:
              """
              Call `func(*args, **kwargs)` and retry only when `validator(result)` is False.
              If no validator is provided, truthiness of the result is used.
              Raises AssertionError if validation never passes after retries.
              """
              if validator is None:
          
                  def validator(value: T) -> bool:  # type: ignore[no-redef]
                      return bool(value)
          
              r = Retrying(
                  stop=stop_after_attempt(1 + retries),  # initial attempt + retries
                  wait=wait_fixed(delay_seconds),  # constant delay
                  retry=retry_if_result(lambda v: not validator(v)),
                  reraise=True,  # real exceptions bubble immediately
              )
              try:
      >           return r(func, *args, **kwargs)
                         ^^^^^^^^^^^^^^^^^^^^^^^^
      
      /iqe_venv/lib/python3.12/site-packages/iqe_rhsm_subscriptions/utils/retry.py:33: 
      ...
      E           AssertionError: Validation not satisfied after 10 retries; last result: None
      
      /iqe_venv/lib/python3.12/site-packages/iqe_rhsm_subscriptions/utils/retry.py:39: AssertionError
      

      Output:

      2026-01-15 07:09:57 INFO Using random seed value for random generation: 5842
      2026-01-15 07:09:57 INFO select version()
      2026-01-15 07:09:57 INFO [raw sql] {}
      2026-01-15 07:09:57 INFO select current_schema()
      2026-01-15 07:09:57 INFO [raw sql] {}
      2026-01-15 07:09:57 INFO show standard_conforming_strings
      2026-01-15 07:09:57 INFO [raw sql] {}
      2026-01-15 07:09:57 INFO BEGIN (implicit)
      2026-01-15 07:09:57 INFO Searching messages in the topic: platform.rhsm-subscriptions.service-instance-ingress
      2026-01-15 07:10:00 INFO Consumer subscribed to: {'platform.rhsm-subscriptions.service-instance-ingress': {0: -1001, 1: -1001, 2: -1001}}
      2026-01-15 07:10:00 INFO Consumer subscribed to: {'platform.rhsm-subscriptions.service-instance-ingress': {0: -1001, 1: -1001, 2: -1001}}
      2026-01-15 07:10:02 INFO Topic scan time limit elapsed
      2026-01-15 07:10:02 INFO Hypervisor physical_mock_cca8a094qwwpxvax.example.ca mock created
      2026-01-15 07:10:02 INFO Guest-0 virtual_mock_d0588cbaoambhppn.example.app mock created
      2026-01-15 07:10:02 INFO Message key=None for insights_id=1037e339-0bad-4566-88b0-b8bc401eaae5 delivered to platform.inventory.events (P 0 O 18)
      2026-01-15 07:10:02 INFO SELECT hbi_event_outbox.id AS hbi_event_outbox_id, hbi_event_outbox.org_id AS hbi_event_outbox_org_id, hbi_event_outbox.created_on AS hbi_event_outbox_created_on, hbi_event_outbox.swatch_event_json AS hbi_event_outbox_swatch_event_json 
      FROM hbi_event_outbox 
      WHERE hbi_event_outbox.org_id = %(org_id_1)s AND (hbi_event_outbox.swatch_event_json ->> %(swatch_event_json_1)s) = %(param_1)s AND (hbi_event_outbox.swatch_event_json ->> %(swatch_event_json_2)s) = %(param_2)s AND (hbi_event_outbox.swatch_event_json ->> %(swatch_event_json_3)s) = %(param_3)s ORDER BY hbi_event_outbox.created_on DESC 
       LIMIT %(param_4)s
      2026-01-15 07:10:02 INFO [generated in 0.00035s] {'org_id_1': '3340851', 'swatch_event_json_1': 'event_type', 'param_1': 'INSTANCE_UPDATED', 'swatch_event_json_2': 'inventory_id', 'param_2': 'c5b63f24-68bf-42b9-9ae8-b31de0a76459', 'swatch_event_json_3': 'timestamp', 'param_3': '2026-01-15T07:00:00Z', 'param_4': 1}
      ...
      2026-01-15 07:10:07 INFO Requesting PUT http://swatch-metrics-hbi-service:8000/api/swatch-metrics-hbi/internal/rpc/outbox/flush
      2026-01-15 07:10:07 INFO PUT request response url: http://swatch-metrics-hbi-service:8000/api/swatch-metrics-hbi/internal/rpc/outbox/flush, status_code: 200
      2026-01-15 07:10:07 INFO Trace log using : None
      2026-01-15 07:10:07 INFO ROLLBACK
      2026-01-15 07:10:07 INFO Disabling feature flag swatch.swatch-metrics-hbi.emit-events
      

      Acceptance Criteria

      • Evaluate whether this test can be moved to component tests (and translate it if possible). If it can be moved it to component test, remove the IQE test. If not, remove the iqe_blocker from the IQE test and fix the flakiness.

              Unassigned Unassigned
              jcarvaja@redhat.com Jose Carvajal Hilario
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: