Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59270

NNCP causing node instability after shutting down and restarting the cluster

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Env:
      3 node cluster

      Cluster Version: 4.16.38(unverified)
      Desired Version: 4.16.38
      Channel: stable-4.16
      Previous Version(s):

      Infrastructure
      --------------
      Platform: BareMetal
      Control Plane Topology: HighlyAvailable
      apiServerInternalIP: xxx.xxx.xxx.xxx
      apiServerInternalIPs: xxx.xxx.xxx.xxx
      ingressIP: xxx.xxx.xxx.xxx
      ingressIPs: xxx.xxx.xxx.xxx
      loadBalancer: None
      machineNetworks: xxx.xxx.xxx.xxx/26
      Install Type: agent-installer

      Network
      -------
      Network Type: OVNKubernetes

      Issue
      When the cluster is shutdown and the restarted the cluster is unstable.

      Observations:
      The ingress vip will not be assigned to any nodes
      The api vip is assigned to a node.
      The NNCP's will be stuck at provisioning.

      To get the cluster back is difficult.
      To get the ingress vip assigned to a node will move the keepalived static manifest on each node to a temp location, wait for the keeapalived pods to terminate then move the manifests back.
      This will need to be tried a couple of times and results in the api vip getting assigned to multiple nodes.

      For the NNCP's they have to be deleted and then recreated whci also does not work the 1st time sometimes.

              bnemec@redhat.com Benjamin Nemec
              rhn-support-dseals Daniel Seals
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: