Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.13.0
Component/s: Networking / ovn-kubernetes
Labels:
- TestBlocker

Severity:
Critical
Regression:
No
Sprint:
SDN Sprint 233, SDN Sprint 234
sprint_count:
2
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Blocked by Bugzilla Bug:
https://bugzilla.redhat.com/show_bug.cgi?id=2180460
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

to install with custom instance types in some regions failed, due to network operator degraded

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-14-053612

How reproducible:

Always

Steps to Reproduce:

1. "create install-config"
2. edit "install-config.yaml", to set compute[0].platform.gcp.type being t2d-standard-2, and controlPlane.platform.gcp.type being t2d-standard-4, along with compute[0].replicas being 2
3. "create cluster"

Actual results:

The installation failed, with some worker nodes NotReady and some operators unavailable.

Expected results:

The installation should succeed.

Additional info:

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          46m     Unable to apply 4.13.0-0.nightly-2023-03-14-053612: some cluster operators are not available
$ oc get nodes
NAME                                                           STATUS     ROLES                  AGE   VERSION
jiwei-24402-0-0-v7xp7-master-0.c.openshift-qe.internal         Ready      control-plane,master   41m   v1.26.2+bc894ae
jiwei-24402-0-0-v7xp7-master-1.c.openshift-qe.internal         Ready      control-plane,master   41m   v1.26.2+bc894ae
jiwei-24402-0-0-v7xp7-master-2.c.openshift-qe.internal         Ready      control-plane,master   41m   v1.26.2+bc894ae
jiwei-24402-0-0-v7xp7-worker-a-x5l5z.c.openshift-qe.internal   NotReady   worker                 22m   v1.26.2+bc894ae
jiwei-24402-0-0-v7xp7-worker-b-qmk9w.c.openshift-qe.internal   NotReady   worker                 22m   v1.26.2+bc894ae
$ oc get machines -n openshift-machine-api
NAME                                   PHASE     TYPE             REGION        ZONE            AGE
jiwei-24402-0-0-v7xp7-master-0         Running   t2d-standard-4   us-central1   us-central1-a   45m
jiwei-24402-0-0-v7xp7-master-1         Running   t2d-standard-4   us-central1   us-central1-b   45m
jiwei-24402-0-0-v7xp7-master-2         Running   t2d-standard-4   us-central1   us-central1-c   45m
jiwei-24402-0-0-v7xp7-worker-a-x5l5z   Running   t2d-standard-2   us-central1   us-central1-a   37m
jiwei-24402-0-0-v7xp7-worker-b-qmk9w   Running   t2d-standard-2   us-central1   us-central1-b   37m
$ oc get co | grep -v 'True        False         False'
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-03-14-053612   False       False         True       39m     OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.88.86:443/healthz": dial tcp 172.30.88.86:443: connect: connection refused...
console                                    4.13.0-0.nightly-2023-03-14-053612   False       False         True       32m     RouteHealthAvailable: console route is not admitted
image-registry                                                                  False       True          True       34m     Available: The deployment does not have available replicas...
ingress                                                                         False       True          True       32m     The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
kube-controller-manager                    4.13.0-0.nightly-2023-03-14-053612   True        False         True       36m     GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host
monitoring                                                                      False       True          True       28m     reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas
network                                    4.13.0-0.nightly-2023-03-14-053612   True        True          True       41m     DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-twsln is in CrashLoopBackOff State...
node-tuning                                4.13.0-0.nightly-2023-03-14-053612   True        True          False      38m     Waiting for 2/5 Profiles to be applied
$

is related to

OCPBUGS-10775 OCP 4.13.0-rc.0 on Nutanix - ovs-configuration.service got ERROR: Cannot bring up connection ovs-if-br-ex after 10 attempts

Closed

links to

openshift/ovn-kubernetes#1613: OCPBUGS-10485: Bump OVS to 3.1.0-10

RHEA-2023:5006 rpm

Assignee:: Patryk Diak

Reporter:: Jianli Wei

QA Contact:: Jianli Wei

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2023/03/17 10:29 AM

Updated:: 2023/10/31 1:34 PM

Resolved:: 2023/10/31 1:16 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates