Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-14170

random failure of Inventory Sync

XMLWordPrintable

    • rh_cloud_44
    • Critical

      +++ This bug was initially created as a clone of Bug #2127180 +++

      Description of problem:

      Customer has new 6.11.2 satelliye that is seeing InventorySync::Async::InventoryScheduledSync fail on multiple occasions. The failures seem to be at random. Manual sync upload seems to work everytime.

      When the InventorySync::Async::InventoryScheduledSync fails, here is the error:

      Error: RestClient::Unauthorized

      401 Unauthorized

      • "/usr/share/gems/gems/rest-client-2.0.2/lib/restclient/abstract_response.rb:223:in
        `exception_with_response'"
      • "/usr/share/gems/gems/rest-client-2.0.2/lib/restclient/abstract_response.rb:103:in
        `return!'"
      • "/usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:809:in `process_result'"
      • "/usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:725:in `block
        in transmit'"
      • "/usr/share/ruby/net/http.rb:933:in `start'"
      • "/usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:715:in `transmit'"
      • "/usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in `execute'"
      • "/usr/share/gems/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in `execute'"
      • "/usr/share/gems/gems/foreman_rh_cloud-5.0.41/app/services/foreman_rh_cloud/cloud_request.rb:11:in
        `execute_cloud_request'"
        ...
        ...
        ...

      Task export for the life of the server is attached.

      Sosreport in case.

      — Additional comment from on 2022-09-15T15:14:26Z

      Created attachment 1912100
      task export

      — Additional comment from on 2022-09-15T15:21:35Z

      Task export added. Hopefully you can correlate the timestamps of the failures with events on the back end.

      — Additional comment from on 2022-09-18T13:17:11Z

      I think it's related to https://access.redhat.com/support/cases/#/case/03252998

      @kwalsh@redhat.com Can you correlate the dates?

      — Additional comment from on 2022-09-27T14:40:34Z

      Hello Shimon,

      Yes, looks related to the case.
      Even I have few more similar cases where customer are facing issue with Insights Sync in 6.11.
      I will even attach my cases to the bug.

      Regards,
      Arvinder

      — Additional comment from on 2022-09-28T11:55:35Z

      Hi Keith, Shimon,

      As we have a couple of customer cases linked to the bugzilla, can we prioritize this bz?

      — Additional comment from on 2022-09-29T16:02:37Z

      Hello Team,

      It seems we have another bug opened for the same issue.
      ~~~ Bug 2126319 - [Sat6/InisghtsSync/Bug] Insights Sync failing frequently on Sat6 instances
      https://bugzilla.redhat.com/show_bug.cgi?id=2126319
      ~~~

      Can please someone review the same the consolidate it in to one and close the duplicate bug.

      Thanks and Regards,
      Arvinder

      — Additional comment from on 2022-09-29T16:05:50Z

      Hello,

      Also wanted to share the JIRA which was used to track this same issue.
      Which was opened for one of my case
      ~~~
      https://issues.redhat.com/browse/RHCLOUD-20254
      ~~~

      Regards,
      Arvinder

      — Additional comment from on 2022-09-29T19:36:02Z

      Hi all. If the timeout issues are being experienced in 6.11, they should not be related to RBAC, as RBAC is not in the picture with cert-auth, so https://issues.redhat.com/browse/RHCLOUD-20254 may not be the cause here. We are continuing to investigate, and may need some involvement from the inventory team.

      — Additional comment from on 2022-10-03T10:32:19Z

      @kwalsh@panix.com Should I move the BZ to some other component, while you are investigating it?

      — Additional comment from on 2022-10-20T15:52:45Z

      Hi Keith,

      Do you have any update for us on this bz? We have 5 customers who reported this issue till now.

      Regards,
      Ashish

      — Additional comment from on 2022-10-20T15:55:25Z

          • Bug 2128115 has been marked as a duplicate of this bug. ***

      — Additional comment from on 2022-10-20T16:56:25Z

      Adding Drew Bomhof to help triage with his team.

      — Additional comment from on 2022-10-21T09:04:38Z

      Hi Team,

      Received the following update from the customer on case 03311089:

      ~~~
      First tests to move the uploads from 0:00UTC to 23:47UTC are very positive. So far already 3x days in a row success and also finishing in 2 seconds insteda of the 20-30 seconds when started at 0:00UTC. I think this looks already a 90% confirmation that the problem is that Satellite schedules for Every instance in the world the upload at 0:00 which is frequently 0:00UTC

      The cloud.redhat.com team must also be able to see this major spike.

      To me the solution looks pretty simple to have a better spread of the uploads throughout the day.
      ~~~

      — Additional comment from on 2022-10-30T22:42:13Z

      Created attachment 1921267
      inventory-sync at 23:47 pm always-success

      Hi team,

      Following is the customer update on case 03311089:
      ~~~
      Another screenshot that proves that the 0:00UTC is the Root Cause of the failures

      What is preventing Insights/Cloud team from responding to customer issues within 7 days when the raise a major issue of the SaaS availablity (a running into timeout means not available)?
      ~~~

      — Additional comment from on 2022-10-31T07:37:24Z

      Created https://issues.redhat.com/browse/SAT-13626 to track Satellite side work on this.

      — Additional comment from on 2022-11-22T08:53:37Z

          • Bug 2126319 has been marked as a duplicate of this bug. ***

            rhn-engineering-sshtein Shimon Shtein
            jira-bugzilla-migration RH Bugzilla Integration
            RH Bugzilla Integration RH Bugzilla Integration
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: