Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5865

oc commands fails to Openshift cluster after bringing down one of the master node to test High Availability

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • 4.10
    • kube-apiserver
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      clone of https://bugzilla.redhat.com/show_bug.cgi?id=2101290
      
      

      Description of problem:
      oc commands fails during node replacement procedure on 3 node master+slave cluster deployed via UPI. oc commands fails post drain and deleting powered down node from cluster
      Exact steps followed:

      One of node was brought down by powering off
      $ oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="EtcdMembersAvailable")]}

      {.message}

      {"\n"}'
      $ oc get nodes -o jsonpath='{range .items[]}

      {"\n"}

      {.metadata.name}

      {"\t"}

      {range .spec.taints[]}

      {.key}

      {" "}' | grep unreachable
      $ oc rsh -n openshift-etcd etcd-layton.ocp2.sl.sdp.hop.lab.emc.com

      etcdctl member list -w table
      etcdctl member remove 252f3666c23ebe80
      $ for i in oc get secrets -n openshift-etcd | grep ogden.ocp2.sl.sdp.hop.lab.emc.com;do oc delete secret $i -n openshift-etcd;done
      $ oc get nodes --show-labels | grep ogden.ocp2.sl.sdp.hop.lab.emc.com
      $ oc get pods -n openshift-storage -o wide | grep ogden.ocp2.sl.sdp.hop.lab.emc.com
      $ oc scale deployment rook-ceph-mon-c --replicas=0 -n openshift-storage
      $ oc scale deployment rook-ceph-osd-0 --replicas=0 -n openshift-storage
      $ oc scale deployment rook-ceph-osd-1 --replicas=0 -n openshift-storage
      $ oc scale deployment rook-ceph-osd-2 --replicas=0 -n openshift-storage
      $ oc scale deployment --selector=app=rook-ceph-crashcollector,node_name=ogden.ocp2.sl.sdp.hop.lab.emc.com --replicas=0 -n openshift-storage
      $ oc adm drain ogden.ocp2.sl.sdp.hop.lab.emc.com --force --delete-local-data --ignore-daemonsets
      $ oc delete node ogden.ocp2.sl.sdp.hop.lab.emc.com

      After deleting node from above step, when we try to edit localvolume, oc command fails a mentioned below.

      $ oc get nodes
      Unable to connect to the server: EOF

      Please note that we have deployed our product on openshift cluster which uses k8 resources prior to performing node replacement

      Version-Release number of selected component (if applicable):
      openshift 4.10.9

      This is been created out of
      https://access.redhat.com/support/cases/#/case/03228703
      ticket.

              Unassigned Unassigned
              mfojtik@redhat.com Michal Fojtik (Inactive)
              None
              None
              Ke Wang Ke Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: