Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: 4.12.z
Affects Version/s: 4.12
Component/s: Cloud Compute / Unknown
Labels:
None

Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
N/A per dev
Release Note Status:
Set a Value
Target Version:

4.12.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Agent based installation fails during the 3+1 deployment. I found that the machine-api-operator degraded due to minimum worker replica count is 2 and for 3+1 deployment we need to define one worker node.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create agent.iso (openshift-install agent create image) using install-config.yaml and agent-config.yaml (PFA sample files)
2. Deploy a 3+1 cluster using agent.iso
3. Execute "openshift-install agent wait-for install-complete" command to wait for install complete.

Actual results:

Getting below error:
ERROR Cluster operator kube-controller-manager Degraded is True with GarbageCollector_Error: GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host 
INFO Cluster operator machine-api Progressing is True with SyncingResources: Progressing towards operator: 4.12.0-0.nightly-2022-10-05-053337 
ERROR Cluster operator machine-api Degraded is True with SyncingFailed: Failed when progressing towards operator: 4.12.0-0.nightly-2022-10-05-053337 because minimum worker replica count (2) not yet met: current running replicas 1, waiting for [] 
INFO Cluster operator machine-api Available is False with Initializing: Operator is initializing 
INFO Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. 
ERROR Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: Failed to rollout the stack. Error: updating prometheus operator: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 1 unavailable replicas 
INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. 
INFO Cluster operator network ManagementStateDegraded is False with :  
ERROR Cluster initialization failed because one or more operators are not functioning properly. 
ERROR 				The cluster should be accessible for troubleshooting as detailed in the documentation linked below, 
ERROR 				https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html

Expected results:

3+1 deployment should be successful.

Additional info:

I found that there is a condition in the machine-api-operator to check that the worker node count should be 2 which is preventing the 3+1 deployment.
https://github.com/openshift/machine-api-operator/blob/master/pkg/operator/sync.go#L322

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

agent-config.yaml
0.1 kB
2022/10/10 11:06 AM
install-config.yaml
0.8 kB
2022/10/10 11:16 AM

blocks

AGENT-373 Manually test 3+1 clusters with agent

Closed

links to

openshift/machine-api-operator#1074: OCPBUGS-2151: Don't degrade when workers not expected

Assignee:: Zane Bitter

Reporter:: Manoj Hans

QA Contact:: Manoj Hans

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2022/10/10 11:06 AM

Updated:: 2024/02/15 3:31 PM

Resolved:: 2023/01/17 7:37 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates