Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10485

to install with custom instance types in some regions failed, due to network operator degraded

XMLWordPrintable

      Description of problem:

      to install with custom instance types in some regions failed, due to network operator degraded

      Version-Release number of selected component (if applicable):

      4.13.0-0.nightly-2023-03-14-053612

      How reproducible:

      Always

      Steps to Reproduce:

      1. "create install-config"
      2. edit "install-config.yaml", to set compute[0].platform.gcp.type being t2d-standard-2, and controlPlane.platform.gcp.type being t2d-standard-4, along with compute[0].replicas being 2
      3. "create cluster" 

      Actual results:

      The installation failed, with some worker nodes NotReady and some operators unavailable.

      Expected results:

      The installation should succeed.

      Additional info:

      $ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version             False       True          46m     Unable to apply 4.13.0-0.nightly-2023-03-14-053612: some cluster operators are not available
      $ oc get nodes
      NAME                                                           STATUS     ROLES                  AGE   VERSION
      jiwei-24402-0-0-v7xp7-master-0.c.openshift-qe.internal         Ready      control-plane,master   41m   v1.26.2+bc894ae
      jiwei-24402-0-0-v7xp7-master-1.c.openshift-qe.internal         Ready      control-plane,master   41m   v1.26.2+bc894ae
      jiwei-24402-0-0-v7xp7-master-2.c.openshift-qe.internal         Ready      control-plane,master   41m   v1.26.2+bc894ae
      jiwei-24402-0-0-v7xp7-worker-a-x5l5z.c.openshift-qe.internal   NotReady   worker                 22m   v1.26.2+bc894ae
      jiwei-24402-0-0-v7xp7-worker-b-qmk9w.c.openshift-qe.internal   NotReady   worker                 22m   v1.26.2+bc894ae
      $ oc get machines -n openshift-machine-api
      NAME                                   PHASE     TYPE             REGION        ZONE            AGE
      jiwei-24402-0-0-v7xp7-master-0         Running   t2d-standard-4   us-central1   us-central1-a   45m
      jiwei-24402-0-0-v7xp7-master-1         Running   t2d-standard-4   us-central1   us-central1-b   45m
      jiwei-24402-0-0-v7xp7-master-2         Running   t2d-standard-4   us-central1   us-central1-c   45m
      jiwei-24402-0-0-v7xp7-worker-a-x5l5z   Running   t2d-standard-2   us-central1   us-central1-a   37m
      jiwei-24402-0-0-v7xp7-worker-b-qmk9w   Running   t2d-standard-2   us-central1   us-central1-b   37m
      $ oc get co | grep -v 'True        False         False'
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.13.0-0.nightly-2023-03-14-053612   False       False         True       39m     OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.88.86:443/healthz": dial tcp 172.30.88.86:443: connect: connection refused...
      console                                    4.13.0-0.nightly-2023-03-14-053612   False       False         True       32m     RouteHealthAvailable: console route is not admitted
      image-registry                                                                  False       True          True       34m     Available: The deployment does not have available replicas...
      ingress                                                                         False       True          True       32m     The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
      kube-controller-manager                    4.13.0-0.nightly-2023-03-14-053612   True        False         True       36m     GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host
      monitoring                                                                      False       True          True       28m     reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas
      network                                    4.13.0-0.nightly-2023-03-14-053612   True        True          True       41m     DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-twsln is in CrashLoopBackOff State...
      node-tuning                                4.13.0-0.nightly-2023-03-14-053612   True        True          False      38m     Waiting for 2/5 Profiles to be applied
      $ 
      

            pdiak@redhat.com Patryk Diak
            rhn-support-jiwei Jianli Wei
            Jianli Wei Jianli Wei
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: