Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.14
Component/s: Machine Config Operator
Labels:
- mco-triaged

Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

On ci/prow/e2e-gcp-ovn-techpreview jobs, we noticed that the installation would always fail. This also happens when launching jobs from ClusterBot with https://github.com/openshift/cloud-provider-gcp/pull/35 and the TechPreview feature flag set.

A sample job is https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cloud-provider-gcp/35/pull-ci-openshift-cloud-provider-gcp-master-e2e-gcp-ovn-techpreview/1700229284032942080

In the above job, the machine configs change from "rendered-worker-d5ca314d6412a630a0df72eb28b88543" to "rendered-worker-3eb45ca2b54008d2d1d5a6701a84bd7e", but no nodes are ever given "rendered-worker-3eb45ca2b54008d2d1d5a6701a84bd7e" as a desired configuration. When diff'ing the two configs, the only notable change is that `/etc/mco/internal-registry-pull-secret.json` changes from being empty to being populated.


Within the Machine Config Operator controller logs of this job, we see the new machine config being generated, but not assigned.

```
☸ ocp/api-ci-l2s4-p1-openshiftapps-com:6443/nbrubake (ocp) in Downloads/artifacts/pods
❯ rg rendered-worker-3
openshift-machine-config-operator_machine-config-controller-7c754bffdd-84k5s_machine-config-controller.log
184:I0908 19:59:48.796617       1 render_controller.go:510] Generated machineconfig rendered-worker-3eb45ca2b54008d2d1d5a6701a84bd7e from 7 configs: [{MachineConfig  00-worker  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-container-runtime  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-kubelet  machineconfiguration.openshift.io/v1  } {MachineConfig  97-worker-generated-kubelet  machineconfiguration.openshift.io/v1  } {MachineConfig  98-worker-generated-kubelet  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-generated-registries  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-ssh  machineconfiguration.openshift.io/v1  }]
185:I0908 19:59:48.797076       1 event.go:298] Event(v1.ObjectReference{Kind:"MachineConfigPool", Namespace:"openshift-machine-config-operator", Name:"worker", UID:"3416d468-710b-4c19-a956-00dead3dec84", APIVersion:"machineconfiguration.openshift.io/v1", ResourceVersion:"25805", FieldPath:""}): type: 'Normal' reason: 'RenderedConfigGenerated' rendered-worker-3eb45ca2b54008d2d1d5a6701a84bd7e successfully generated (release version: 4.15.0-0.ci.test-2023-09-08-193239-ci-op-h57nt20x-latest, controller version: 5b821a279c88fee1cc1886a6cf1ec774891a2258)
187:I0908 19:59:48.872593       1 render_controller.go:536] Pool worker: now targeting: rendered-worker-3eb45ca2b54008d2d1d5a6701a84bd7e
```

We have _not_ seen this when deploying onto GCP manually, however.

A similar ClusterBot failure is here: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1702401781788577792

Version-Release number of selected component (if applicable):

How reproducible:

So far, 100% of the time with prow-based deployments

Steps to Reproduce:

1. Launch ci/prow/e2e-gcp-ovn-techpreview or create a GCP cluster with TechPreviewNoUpgrade from ClusterBot
2. Wait for the launch to fail
3.

Actual results:

Worker nodes get restarted and result in services such as olm, ingress, the image registry, and others to become unavailable

Expected results:

The image pull secret doesn't change during an install, or if it does, it doesn't result in stuck workers.

Additional info:

No Machines or MachineSets are populated in the gather-extras, but that's likely because this is meant to be a ClusterAPI-managed cluster, which uses a different API group than the gather scripts use.

Also, there are worker nodes in the gather-extra, but they are marked as unschedulable due to missing network routes.

blocks

OCPBUGS-5755 GCP XPN private cluster install attempts to add masters to k8s-ig-xxxx instance groups

Closed

relates to

OCPBUGS-18572 [gcp] installation with "featureSet: TechPreviewNoUpgrade" failed, possibly due to nodes getting taint - "node.kubernetes.io/network-unavailable"

Closed

Assignee:: David Joshy

Reporter:: Nolan Brubaker

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/09/15 8:02 PM

Updated:: 2023/10/04 6:07 PM

Resolved:: 2023/10/04 6:07 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates