-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
False
-
False
-
-
Approved
Issue: On 4/16 on c.rh.c an action to create a new pulp RepositoryVersion for the `published` Repository did not copy content from previous RepositoryVersions, resulting in it having 0 content. Content was manually copied back into `published` Repository, but root cause has not been determined.
Mitigations:
We increased the `published` Repository `retain_repo_versions` from 1 to 10000, (edit: 2/22 we updated all AnsibleRepository, including `published` to `retain_repo_versions`= 50, so they are all protected) if this happens again, we can point the pulp `published` Distribution to an accurate RepositoryVersion- We temporarily reduced the use of tasks - asking PE to not upload/approve/delete, and turned off ability for uses to click the synclist toggle https://github.com/ansible/galaxy_ng/pull/1138
Possible root causes:
We updated our move task in 4/16 deploy(found to be proper)- https://issues.redhat.com/browse/AAH-1384 orphan_protection_time set to 0 can cause race conditions
- Pulp forum reported upgrade issue from 3.15 to 3.17, same upgrade we did on 4/16
- Stage env did migration upgrades incrementally, Prod did them all at once
- Curate task locks only on synclist repo and not upstream_repo used as base repo. “maybe in the meantime the repoversion were cleaned up from the upstream_repo and the base_version you're hoping to find is gone” (update 2/23: logs confirm this situation occurred and then 7 workers died at once, occurred ~2hr before the outage was reported, prevention pr: https://github.com/ansible/galaxy_ng/pull/1141)
- Collection deletion endpoints worth reviewing
- cleanup_old_versions should not count unfinished versions. (This is usually protected by only manipulating a repo in a task. It should never encounter unfinished versions.)
- Error on GET to v3/namespaces: http://pastebin.test.redhat.com/1030923 (this may be caused by is_org_admin always turned to false in mitigation pr, may be causing synclists_owned_by_group to come back false)