Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10768

[alibabacloud] IPI installation failed with one master node NotReady

    XMLWordPrintable

Details

    • Critical
    • Yes
    • Rejected
    • False
    • Hide

      The issue is about very basic IPI installation on Alibabacloud.

      Show
      The issue is about very basic IPI installation on Alibabacloud.

    Description

      Description of problem:

      IPI installation failed with one master node NotReady

      Version-Release number of selected component (if applicable):

      4.13.0-0.nightly-2023-03-22-165711

      How reproducible:

      Not every time, but at least 60%.

      Steps to Reproduce:

      1. "create install-config", then insert "credentialsMode: Manual" into install-config.yaml
      2. "create manifests"
      3. manually create the required credentials
      4. "create cluster" 

      Actual results:

      1. Installation failed with one master node stuck in NotReady. Besides, the kube-apiserver seems only available on the NotReady master node.
      2. "oc adm must-gather" cannot finish due to below error:
      [must-gather-kz68s] POD 2023-03-23T08:05:28.094281416Z E0323 08:05:28.094237     559 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
      

      Expected results:

      Installation should succeed.

      Additional info:

      $ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version             False       True          3h44m   Unable to apply 4.13.0-0.nightly-2023-03-22-165711: some cluster operators are not available
      $ oc get nodes
      NAME                                        STATUS     ROLES                  AGE     VERSION
      jiwei-0323b-sbfhb-master-0                  Ready      control-plane,master   3h38m   v1.26.2+dc93b13
      jiwei-0323b-sbfhb-master-1                  NotReady   control-plane,master   3h39m   v1.26.2+dc93b13
      jiwei-0323b-sbfhb-master-2                  Ready      control-plane,master   3h38m   v1.26.2+dc93b13
      jiwei-0323b-sbfhb-worker-us-east-1a-rm6gz   Ready      worker                 3h14m   v1.26.2+dc93b13
      jiwei-0323b-sbfhb-worker-us-east-1b-d2vlp   Ready      worker                 3h16m   v1.26.2+dc93b13
      $ oc get co | grep -v 'True        False         False'
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.13.0-0.nightly-2023-03-22-165711   False       True          True       3h34m   WellKnownAv
      ailable: The well-known endpoint is not yet available: kube-apiserver oauth endpoint https://10.0.161.254:6443/.well-known/oauth-authori
      zation-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll 
      out successfully, which can take several minutes per instance)
      console                                    4.13.0-0.nightly-2023-03-22-165711   False       True          False      3h17m   DeploymentAvailable: 0 replicas available for console deployment...
      dns                                        4.13.0-0.nightly-2023-03-22-165711   True        True          False      3h31m   DNS "default" reports Progressing=True: "Have 4 available node-resolver pods, want 5."
      etcd                                       4.13.0-0.nightly-2023-03-22-165711   True        True          True       3h26m   NodeControllerDegraded: The master nodes not ready: node "jiwei-0323b-sbfhb-master-1" not ready since 2023-03-23 04:39:07 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      image-registry                             4.13.0-0.nightly-2023-03-22-165711   True        True          False      3h13m   Progressing: The registry is ready...
      kube-apiserver                             4.13.0-0.nightly-2023-03-22-165711   False       True          True       3h31m   StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 8
      kube-controller-manager                    4.13.0-0.nightly-2023-03-22-165711   True        True          True       3h27m   NodeControllerDegraded: The master nodes not ready: node "jiwei-0323b-sbfhb-master-1" not ready since 2023-03-23 04:39:07 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      kube-scheduler                             4.13.0-0.nightly-2023-03-22-165711   True        True          True       3h27m   InstallerPodContainerWaitingDegraded: Pod "installer-8-jiwei-0323b-sbfhb-master-1" on node "jiwei-0323b-sbfhb-master-1" container "installer" is waiting since 2023-03-23 04:34:55 +0000 UTC because ContainerCreating...
      machine-config                             4.13.0-0.nightly-2023-03-22-165711   False       False         True       3h1m    Cluster not available for [{operator 4.13.0-0.nightly-2023-03-22-165711}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 5, updated: 5, ready: 4, unavailable: 1)]
      network                                    4.13.0-0.nightly-2023-03-22-165711   True        True          False      3h27m   DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)...
      openshift-apiserver                        4.13.0-0.nightly-2023-03-22-165711   True        True          True       3h21m   APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
      openshift-controller-manager                                                    True        True          False      3h22m   Progressing: deployment/controller-manager: updated replicas is 1, desired replicas is 3...
      storage                                    4.13.0-0.nightly-2023-03-22-165711   True        True          False      3h30m   AlibabaDiskCSIDriverOperatorCRProgressing: AlibabaCloudDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
      $ oc get mc
      NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
      00-master                                          40575b862f7bd42a2c40c8e6b7203cd4c29b0021   3.2.0             3h35m
      00-worker                                          40575b862f7bd42a2c40c8e6b7203cd4c29b0021   3.2.0             3h35m
      01-master-container-runtime                        40575b862f7bd42a2c40c8e6b7203cd4c29b0021   3.2.0             3h35m
      01-master-kubelet                                  40575b862f7bd42a2c40c8e6b7203cd4c29b0021   3.2.0             3h35m
      01-worker-container-runtime                        40575b862f7bd42a2c40c8e6b7203cd4c29b0021   3.2.0             3h35m
      01-worker-kubelet                                  40575b862f7bd42a2c40c8e6b7203cd4c29b0021   3.2.0             3h35m
      99-master-generated-registries                     40575b862f7bd42a2c40c8e6b7203cd4c29b0021   3.2.0             3h35m
      99-master-ssh                                                                                 3.2.0             3h46m
      99-worker-generated-registries                     40575b862f7bd42a2c40c8e6b7203cd4c29b0021   3.2.0             3h35m
      99-worker-ssh                                                                                 3.2.0             3h46m
      rendered-master-9e0818c061f7631f68edf9b2ba5e99a3   40575b862f7bd42a2c40c8e6b7203cd4c29b0021   3.2.0             3h23m
      rendered-master-cae5598b9b13fb23fcd137194dd792a2   40575b862f7bd42a2c40c8e6b7203cd4c29b0021   3.2.0             3h35m
      rendered-worker-00835171ebcd7e1659f374a933dec318   40575b862f7bd42a2c40c8e6b7203cd4c29b0021   3.2.0             3h35m
      rendered-worker-26e6e9e3ac817c53ec3e6fa304c93334   40575b862f7bd42a2c40c8e6b7203cd4c29b0021   3.2.0             3h23m
      $ oc get mcp
      NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      master   rendered-master-cae5598b9b13fb23fcd137194dd792a2   False     True       False      3              1                   1                     0                      3h36m
      worker   rendered-worker-26e6e9e3ac817c53ec3e6fa304c93334   True      False      False      2              2                   2                     0                      3h36m
      $ 
      

      Attachments

        Issue Links

          Activity

            People

              bteng@redhat.com Bo Teng
              rhn-support-jiwei Jianli Wei
              Jianli Wei Jianli Wei
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: