Uploaded image for project: 'Agent-based deployment for OpenShift Installer'
  1. Agent-based deployment for OpenShift Installer
  2. AGENT-721

Agent-based install on vSphere with multiple workers fails

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • openshift-4.13
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      OCP 4.13.4 cluster deployment hangs with storage operator reporting "VSphereCSIDriverOperatorCRProgressing: Waiting for VSphere operator to report status" and multiple cluster operators degraded:

      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.13.4    False       True          True       94m     OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found...
      baremetal                                  4.13.4    True        False         False      93m     
      cloud-controller-manager                   4.13.4    True        False         False      100m    
      cloud-credential                           4.13.4    True        False         False      106m    
      cluster-autoscaler                         4.13.4    True        False         False      93m     
      config-operator                            4.13.4    True        False         False      94m     
      console                                    4.13.4    False       False         True       84m     RouteHealthAvailable: console route is not admitted
      control-plane-machine-set                  4.13.4    True        False         False      93m     
      csi-snapshot-controller                    4.13.4    True        False         False      94m     
      dns                                        4.13.4    True        False         False      93m     
      etcd                                       4.13.4    True        False         False      93m     
      image-registry                             4.13.4    True        False         False      88m     
      ingress                                              False       True          True       94m     The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
      insights                                   4.13.4    True        False         False      88m     
      kube-apiserver                             4.13.4    True        False         False      91m     
      kube-controller-manager                    4.13.4    True        False         True       91m     GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host
      kube-scheduler                             4.13.4    True        False         False      91m     
      kube-storage-version-migrator              4.13.4    True        False         False      94m     
      machine-api                                4.13.4    True        False         False      90m     
      machine-approver                           4.13.4    True        False         False      94m     
      machine-config                             4.13.4    True        False         False      93m     
      marketplace                                4.13.4    True        False         False      93m     
      monitoring                                           False       True          True       84m     reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas
      network                                    4.13.4    True        True          False      94m     Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready
      node-tuning                                4.13.4    True        False         False      93m     
      openshift-apiserver                        4.13.4    True        True          True       89m     APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
      openshift-controller-manager               4.13.4    True        False         False      90m     
      openshift-samples                          4.13.4    True        False         False      88m     
      operator-lifecycle-manager                 4.13.4    True        False         False      94m     
      operator-lifecycle-manager-catalog         4.13.4    True        False         False      94m     
      operator-lifecycle-manager-packageserver   4.13.4    True        False         False      89m     
      service-ca                                 4.13.4    True        False         False      94m     
      storage                                    4.13.4    True        True          False      94m     VSphereCSIDriverOperatorCRProgressing: Waiting for VSphere operator to report status

       

      Multiple pods cannot start due to tainted nodes:

      "4 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}"

      [master0, [\{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"},\{"effect":"NoSchedule","key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true"}]]
      [master1, [\{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"}]]
      [master2, [\{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"}]]
      [worker0, [\{"effect":"NoSchedule","key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true"}]]
      [worker1, [\{"effect":"NoSchedule","key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true"}]]
      [worker2, [\{"effect":"NoSchedule","key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true"}]]

            Unassigned Unassigned
            rhn-gps-tbonds Thomas Bonds
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: