Loading...

XML

Word

Printable

Type: Epic
Resolution: Obsolete
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Epic Name:
GHA Migration
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Status:
To Do
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Intelligence Requested:
Market:

Background

The Zuul instance on which we rely for nearly all of our CI has become largely unusable. We lack the resources and knowledge to address this problem, and as such, we are unable to maintain the collections for which we are responsible (for example, see https://github.com/ansible-collections/kubernetes.core/pull/575). This is not a new problem, but it has reached a state that can no longer be ignored. Even if we could find a way to fix the current problems, history strongly suggests we will eventually find ourselves back in a similar scenario.

Our current CI has grown organically over years in an environment that lacked any clear overarching responsibility for the service. It is the product of the hard work of many people trying to solve problems as they have arisen. Unfortunately, this kind of growth has led to the situation we now find ourselves in. Broadly speaking, the problems we face are:

A lack of knowledge and expertise - Zuul is a complicated tool. We do not have the expertise to address problems with it in a timely manner.
A high rate of change - The configuration for our CI jobs is frequently subject to change. This makes it harder to keep documentation up to date, and harder for people on the team to keep up with the current state of CI configuration.
Large variance across repos - The list of which jobs are run, how they are run, and the environment in which they are run varies greatly from repo to repo. This increases the complexity and maintenance burden of CI management.
Frequent instability - The hosted Zuul service we use frequently suffers from problems including things such as expired certs, node failures, networking outtages, and general performance issues.

In an environment where CI maintenance is a burden shared across the whole team, we have to employ a strategy that makes it easy for everyone to take part. We cannot have a single CI expert, because that position does not exist. Our approach to CI should account for this reality by:

Offloading the burden of managing infrastructure
Reducing the rate of change, and better managing how it is communicated and made
Minimizing the complexity by standardizing tooling and processes across collections
Balancing the features we choose to implement in our CI chain with the level of resources we have available
Writing and maintaining better documentation

In addition to moving our own repos off of Zuul, we need to account for the fact that a large number of community repos, and some supported repos, are also still on Zuul. The Cloud Content team are the defacto maintainers of Zuul at this point. While we cannot be responsible for migrating every repo that currently exists on Zuul, we do need to ensure that we are communicating with those who may be affected by this.

All work should follow the process outlined in the Cloud Content Handbook (https://github.com/ansible-collections/cloud-content-handbook). Specifically, changes to CI should be documented before they are implemented. The next steps document will serve as an overall guide for this CI work.

For reference, the list of repos that need to be migrated:

Definition of Done

CI processes for our current repos that run in Zuul are running on GHA
Our CI implementation is fully documented
Ongoing maintenance of Zuul is either transferred to someone else, or those who remain on Zuul have a path forward for also migrating off Zuul.

Assignee:: Mike Graves

Reporter:: Mike Graves

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/02/20 1:54 PM

Updated:: 2024/10/24 8:13 PM

Resolved:: 2024/10/24 8:13 PM

Details

Description

Background

Definition of Done

Attachments

Easy Agile Planning Poker

Activity

People

Dates