Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43721

[Nutanix] cluster install fails nodes stuck in node.cloudprovider.kubernetes.io/uninitialized

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          OCP cluster install fails on Nutanix flow env, nodes tainted with node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

      Version-Release number of selected component (if applicable):

          4.17.0-0.nightly-2024-10-22-215145
      

      How reproducible:

          Always

      Steps to Reproduce:

          1. Install OCP on Nutanix flow env failed with error
      
      # oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version             False       True          4m29s   Unable to apply 4.17.0-0.nightly-2024-10-22-215145: an unknown error has occurred: MultipleErrors
      
      # oc get nodes
      NAME                       STATUS     ROLES                  AGE     VERSION
      sgao-flow-n9xdl-master-0   NotReady   control-plane,master   2m22s   v1.30.5
      sgao-flow-n9xdl-master-1   NotReady   control-plane,master   2m28s   v1.30.5
      sgao-flow-n9xdl-master-2   NotReady   control-plane,master   2m23s   v1.30.5
      
      # oc describe node sgao-flow-n9xdl-master-0
      Name:               sgao-flow-n9xdl-master-0
      Roles:              control-plane,master
      Labels:             beta.kubernetes.io/arch=amd64
                          beta.kubernetes.io/os=linux
                          kubernetes.io/arch=amd64
                          kubernetes.io/hostname=sgao-flow-n9xdl-master-0
                          kubernetes.io/os=linux
                          node-role.kubernetes.io/control-plane=
                          node-role.kubernetes.io/master=
                          node.openshift.io/os_id=rhcos
      Annotations:        alpha.kubernetes.io/provided-node-ip: 10.10.1.37
                          volumes.kubernetes.io/controller-managed-attach-detach: true
      CreationTimestamp:  Wed, 23 Oct 2024 02:48:16 -0400
      Taints:             node-role.kubernetes.io/master:NoSchedule
                          node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
                          node.kubernetes.io/not-ready:NoSchedule
      Unschedulable:      false
      Lease:
        HolderIdentity:  sgao-flow-n9xdl-master-0
        AcquireTime:     <unset>
        RenewTime:       Wed, 23 Oct 2024 02:53:31 -0400
      Conditions:
        Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
        ----             ------  -----------------                 ------------------                ------                       -------
        MemoryPressure   False   Wed, 23 Oct 2024 02:48:57 -0400   Wed, 23 Oct 2024 02:48:16 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
        DiskPressure     False   Wed, 23 Oct 2024 02:48:57 -0400   Wed, 23 Oct 2024 02:48:16 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
        PIDPressure      False   Wed, 23 Oct 2024 02:48:57 -0400   Wed, 23 Oct 2024 02:48:16 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
        Ready            False   Wed, 23 Oct 2024 02:48:57 -0400   Wed, 23 Oct 2024 02:48:16 -0400   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
      Addresses:
        InternalIP:  10.10.1.37
        Hostname:    sgao-flow-n9xdl-master-0
      Capacity:
        cpu:                8
        ephemeral-storage:  125238252Ki
        hugepages-1Gi:      0
        hugepages-2Mi:      0
        memory:             16306408Ki
        pods:               250
      Allocatable:
        cpu:                7500m
        ephemeral-storage:  114345831029
        hugepages-1Gi:      0
        hugepages-2Mi:      0
        memory:             15155432Ki
        pods:               250
      System Info:
        Machine ID:                                  07aef2cad3a74b51bd5f9123d6b8d7c0
        System UUID:                                 07aef2ca-d3a7-4b51-bd5f-9123d6b8d7c0
        Boot ID:                                     bf73ad9c-4f58-4a3e-96c3-a745c8b337af
        Kernel Version:                              5.14.0-427.42.1.el9_4.x86_64
        OS Image:                                    Red Hat Enterprise Linux CoreOS 417.94.202410211619-0
        Operating System:                            linux
        Architecture:                                amd64
        Container Runtime Version:                   cri-o://1.30.6-6.rhaos4.17.git6ac6e96.el9
        Kubelet Version:                             v1.30.5
        Kube-Proxy Version:                          v1.30.5
      Non-terminated Pods:                           (5 in total)
        Namespace                                    Name                                                          CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
        ---------                                    ----                                                          ------------  ----------  ---------------  -------------  ---
        openshift-cloud-controller-manager-operator  cluster-cloud-controller-manager-operator-6669679c46-xh55m    30m (0%)      0 (0%)      95Mi (0%)        0 (0%)         4m36s
        openshift-machine-config-operator            kube-rbac-proxy-crio-sgao-flow-n9xdl-master-0                 20m (0%)      0 (0%)      50Mi (0%)        0 (0%)         5m
        openshift-nutanix-infra                      coredns-sgao-flow-n9xdl-master-0                              200m (2%)     0 (0%)      400Mi (2%)       0 (0%)         4m34s
        openshift-nutanix-infra                      haproxy-sgao-flow-n9xdl-master-0                              200m (2%)     0 (0%)      400Mi (2%)       0 (0%)         3m56s
        openshift-nutanix-infra                      keepalived-sgao-flow-n9xdl-master-0                           200m (2%)     0 (0%)      400Mi (2%)       0 (0%)         4m33s
      Allocated resources:
        (Total limits may be over 100 percent, i.e., overcommitted.)
        Resource           Requests     Limits
        --------           --------     ------
        cpu                650m (8%)    0 (0%)
        memory             1345Mi (9%)  0 (0%)
        ephemeral-storage  0 (0%)       0 (0%)
        hugepages-1Gi      0 (0%)       0 (0%)
        hugepages-2Mi      0 (0%)       0 (0%)
      Events:
        Type    Reason          Age    From             Message
        ----    ------          ----   ----             -------
        Normal  RegisteredNode  5m16s  node-controller  Node sgao-flow-n9xdl-master-0 event: Registered Node sgao-flow-n9xdl-master-0 in Controller

      Actual results:

          Install failed

      Expected results:

          Install success

      Additional info:

          Thread discussion pls see https://redhat-external.slack.com/archives/C0211848DBN/p1728357973803949?thread_ts=1727883651.537649&cid=C0211848DBN

              yanhli@redhat.com Yanhua Li (Inactive)
              sgaoshang Shang Gao
              None
              None
              Gaoyun Pei Gaoyun Pei
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: