Uploaded image for project: 'Red Hat 3scale API Management'
  1. Red Hat 3scale API Management
  2. THREESCALE-2204

Fix efficiency/bugs to destroy tenants/services and enable the worker again

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • System
    • None
    • 0
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • 3scale 2019-08-12, 3scale 2019-08-26, 3scale 2019-09-09, 3scale 2019-09-30, 3scale 2019-10-14, 3scale 2019-10-28, 3scale 2019-11-11, 3scale 2019-11-25, 3scale 2019-12-09

      Fix the root cause of what is explained in: Post Mortem 3scale elevated level of API errors in our 3scale Management API - 26/03/2019

      We tried to destroy thousands of tenants at once and System couldn't take that huge load.
      We thought was because we do a table scan in doing this:

      SELECT  `mail_dispatch_rules`.* FROM `mail_dispatch_rules` WHERE `mail_dispatch_rules`.`system_operation_id` = X AND `mail_dispatch_rules`.`account_id` = Y;
      

      But apparently we don't:

      mysql> EXPLAIN SELECT  `mail_dispatch_rules`.* FROM `mail_dispatch_rules` WHERE `mail_dispatch_rules`.`system_operation_id` = X AND `mail_dispatch_rules`.`account_id` = Y;
      +----+-------------+---------------------+------------+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+---------+-------------+------+----------+-------+
      | id | select_type | table               | partitions | type  | possible_keys                                                   | key                                                             | key_len | ref         | rows | filtered | Extra |
      +----+-------------+---------------------+------------+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+---------+-------------+------+----------+-------+
      |  1 | SIMPLE      | mail_dispatch_rules | NULL       | const | index_mail_dispatch_rules_on_system_operation_id_and_account_id | index_mail_dispatch_rules_on_system_operation_id_and_account_id | 18      | const,const |    1 |   100.00 | NULL  |
      +----+-------------+---------------------+------------+-------+-----------------------------------------------------------------+-----------------------------------------------------------------+---------+-------------+------+----------+-------+
      1 row in set, 1 warning (0.01 sec)
      

      mnoyabon asked mmacejko what happened these last 2 days before writing this Jira issue and he said:

      so, there are no missing indexes.. i went almost through all used queries.. (the whole table search test is still pending tho).
      + i finished the test for including associations for main object.. there are less queries after the change.
      but didn’t have time to finish all of that since i was helping guys on friday..

      Operations also suggested to look at the slow queries through select * from mysql.slow_log; from the system-multitenant db.

      When we finish this, the root cause and maybe other of the proposals in Corrective and Preventative Measures of the Post Mortem, we can enable this feature in SaaS again.
      It was disabled for both SaaS and on-prem in this PR, and enabled in on-prem only in this PR.

              Unassigned Unassigned
              mnoyabon Marta Noya (Inactive)
              Jakub Smolár Jakub Smolár
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: