Loading...

XML

Word

Printable

Type: Epic
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Central
Labels:
- review

Epic Name:
Central HA Phase 1: Minimal downtime upgrades
Activity Type:
Product / Portfolio Work
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Epic Status:
To Do
Feature Link:
ROX-32124 - [Discovery] High Availability Scanning Architecture Research
Hierarchy Progress Bar:

100% To Do, 0% In Progress, 0% Done
Status Summary:
Hide

2026-02-24:

Addressing some design review comments while waiting for the final technical review meeting on Feb 26th

2026-02-17:

Proposal for changes of the feature ROX-32124 presented and signed off at ACS ENG / PM sync

Technical design for minimal downtime upgrades under review: https://docs.google.com/document/d/1kRyn96HcL7O6Eje8ptYikKjnUfM3AxL3nxWhU-Q-Fj0
Show
2026-02-24: Addressing some design review comments while waiting for the final technical review meeting on Feb 26th 2026-02-17: Proposal for changes of the feature ROX-32124 presented and signed off at ACS ENG / PM sync Technical design for minimal downtime upgrades under review: https://docs.google.com/document/d/1kRyn96HcL7O6Eje8ptYikKjnUfM3AxL3nxWhU-Q-Fj0
Intelligence Requested:
Market:
PX Impact Score:

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Overview:

A high level summary that describes the Epic in a clear, concise way. Complete during New status.

Part of ROX-32124.

One of the biggest issues with central's availability is that for larger deployments migration downtimes lead to unacceptable long downtimes of the API. This is blocking customers CI and/or prevents them from putting ACS scans into critical build paths.

This Epic should address that issue by making sure old central pods can keep running while a new version is rolling out doing database migrations.

Requirements:

A list of specific needs or objectives that an epic must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.

Technical Scope:

High-level list of items that are in scope; usually completed by a staff engineer or a lead from the Feature Delivery Team. Initial completion during Refinement status.

Central's deployment strategy needs to change from Recreate to RollingUpdate
DB migration process needs to be aware that other processes could start migrations and prevent conflicts
Background workers need to be aware that multiple centrals can be running for the upgrade period, so that they don't conflict or do the same work twice
DB migration and review guidlines should be updated
- Verify we don't exclusively lock tables
- Verify the migrations are implemented in a way that they can handle new data being added or existing data being modified or deleted during migration
There have to be CI tests to ensure:
- DB schema is backwards compatible (we already have this)
- Old central can keep running while new central is in migration without major conflicts or introducing data inconsistencies
Addressing the backfill / index creation migrations issue
- To prevent central RollingUpdate from being perceived as a stuck rollout, migrations should finish in reasonable time (<10 min maybe even less)
- Migrations backfilling data on large tables or creating indices can take long, even up to hours
- We have the same issue today with the Recreate rollout, which this is listed as optional

Out of Scope:

High-level list of items that are out of scope. Initial completion during Refinement status.

Having multiple central pods outside the upgrade process

Outstanding Questions (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

links to

Design: Central HA Phase 1: Minimal Downtime Upgrades

Assignee:: Johannes Malsam

Reporter:: Johannes Malsam

Team:: ACS Cloud Service

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2026/02/05 7:42 AM

Updated:: 2026/02/25 10:29 PM

Details

Description

Overview:

Requirements:

Technical Scope:

Out of Scope:

Outstanding Questions (Optional):

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide