Uploaded image for project: 'Subscription Watch'
  1. Subscription Watch
  2. SWATCH-4446

Fix flaky PR pipeline ephemeral tests. It seems they all are relying on kafka.

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • subs-swatch-lightning

      Each of the previous 4 PR pipeline runs had at least 1 test has fail.  Is there a similar timing issue that is affecting each of these? 

      • Maybe kafka is being slower than normal.  See "Skipping record for expired window"
      • 3 of them started within 7 minutes of each other. maybe they are slowing each other down? 

      https://ci.ext.devshift.net/job/RedHatInsights-rhsm-subscriptions-pr-check/13865/

      tests.component.swatch_billable_usage.test_swatch_billable_usage.test_multiple_tally_summaries_not_aggregated_for_different_metric_ids                   
      >       assert False, error_message
      E       AssertionError: The action 'wait for 2 kafka messages' was not completed within the timeout period. Last result was 'None'.
      E       assert False
      
      swatch-billable-usage-service-6cc44f4846-r58tl_swatch-billable-usage-service.log
      2026-01-08 18:03:36,041 WARN  [org.apache.kafka.streams.kstream.internals.KStreamWindowAggregate] (platform.rhsm-subscriptions.swatch-billable-usage-aggregator-670817a3-6dc4-4b59-b4ca-136d35225319-StreamThread-1) Skipping record for expired window. topic=[platform.rhsm-subscriptions.swatch-billable-usage-aggregator-billable-usage-store-repartition] partition=[0] offset=[55] timestamp=[1767895415932] window=[1767895413000,1767895416000) expiration=[1767895416008] streamTime=[1767895416008]
      

       
      https://ci.ext.devshift.net/job/RedHatInsights-rhsm-subscriptions-pr-check/13866/

      tests.component.swatch_metrics_hbi.test_hbi_delete_events.test_delete_hbi_host
      >       host_deleted_outbox_record = helpers.trigger_and_find_outbox_record(hbi_delete_event)
      
      tests.component.swatch_tally.test_tally_multiple_products_same_instance.test_tally_multiple_products_same_instance_conflicting_events                       
      >               assert tally_total == value, (
                          f"Total tally value for \{metric} is \{tally_total} when it should be \{value}"
                      )
      E               AssertionError: Total tally value for vCPUs is 0 when it should be 40
      E               assert 0 == 40
      

       
       
      https://ci.ext.devshift.net/job/RedHatInsights-rhsm-subscriptions-pr-check/13864/

      tests.component.swatch_billable_usage.test_swatch_billable_usage.test_remittance_matches_tally_snapshot_data_for_non_contract_products_with_billing_factor_below_one
      # verify value is 20*0.25=5.0
      > verify_kafka_hourly_aggregate_data(application, billing_provider, billing_account_id, 5.0)
      
       tests.component.swatch_tally.test_tally_multiple_products_same_instance.test_tally_multiple_products_same_instance
                      tally_total = sum(record["value"] for record in tally.data)
      >               assert tally_total == value, (
                          f"Total tally value for {metric} is {tally_total} when it should be {value}"
                      )
      E               AssertionError: Total tally value for vCPUs is 0 when it should be 40
      E               assert 0 == 40
      

       
      https://ci.ext.devshift.net/job/RedHatInsights-rhsm-subscriptions-pr-check/13863/

      tests.component.swatch_metrics_hbi.test_hbi_delete_events.test_guest_deleted
              helpers.swatch_events.assert_swatch_event(
                  adjusted_hypervisor_outbox_record.swatch_event_json,
                  hbi_physical_host,
                  umg_swatch_event_dt,
                  # Hypervisor should have modulo-2 sockets and no cores measurement
                  # since cores sent from HBI are 0 and are not normalized.
                  helpers.swatch_events.swatch_measurements(sockets=2),
                  sla=None,
                  usage=None,
                  hardware_type="Physical",
                  isHypervisor=True,
                  product_ids=[],
                  product_tag=[],
                  event_type=hypervisor_updated_event_type,
              )
      
              records_flushed = helpers.flush_outbox()
      >       assert records_flushed == 3
      E       assert 2 == 3
      

      Resources:

      • Logs are attached for each of these runs.
      • zipped artifacts are attached
      • A screenshot with the start times.

      Acceptance Criterial:
      * Investigate and fix the failure for one of the test

      • Investigate if that fix might fix the other tests
      • if some of the other tests are clearly unrelated, create new jiras for them.

        1. subscriptions-pr-check-13864.txt
          2.74 MB
          Vanessa Busch
        2. subscriptions-pr-check-13863.txt
          13.14 MB
          Vanessa Busch
        3. subscriptions-pr-check-13866.txt
          14.96 MB
          Vanessa Busch
        4. subscriptions-pr-check-13865.txt
          16.06 MB
          Vanessa Busch
        5. Screenshot From 2026-01-08 15-14-34.png
          398 kB
          Vanessa Busch

              Unassigned Unassigned
              buschv Vanessa Busch
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: