Uploaded image for project: 'Product Technical Learning'
  1. Product Technical Learning
  2. PTL-8072

RH436-13: Troubleshooting installation

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Minor Minor
    • RH436 - RHEL 7.1 2
    • RH436 - RHEL 6.2 2
    • RH436
    • None

      URL:
      Reporter RHNID:
      Section: -
      Language:
      Workaround:

      Description: Here are some troubleshooting tips from Phil for surviving a cluster install where one or more nodes fails: This can be added to expand the "Troubleshooting Installation" section (p89) but some portions may need to wait until later in the unit or just in the IG to help instructors manage the situation.

      One common source of the situation mentioned on p89 is when the student checks the "reboot before joining" box.

      from a good node (one that joined the cluster, usually node1) copy the cluster.conf to the failed node(s) then use Luci to "join cluster" [instead of "add node"]. Be patient.

      Verify the chkconfig --list output for these nodes and start services as needed. Perhaps use ccs:
      [root@node1~]# yum install -y ccs
      [root@node1~]# ccs -h node1 --startall
      [root@node1~]# chkconfig --list
      [root@node2~]# chkconfig --list

      A successful installation procedure results in these service settings on all nodes: (this looks nice in a table for both the SG and IG):
      clvmd on
      cman on (may also start fenced, qdiskd, groupd, dlm_controld, gfs_controld)
      corosync off (OK to be off, since this is started by cman when cman starts)
      gfs2 on
      messagebus on (also started by ricci)
      modclusterd off (OK to be off, since this is started on demand by ricci/oddjob)
      oddjobd off (OK to be off, since this is started by ricci when ricci starts)
      rgmanager on
      ricci on
      saslauthd off (also started by ricci)

      Certain commands, introduced in later units, help troubleshoot unfinished configurations. After ensuring that the ccs package is installed on all nodes (including node4), these are useful, each for a specific purpose:
      To only copy the latest configuration file to all nodes (where ricci must be running) but no further action:
      [root@node4 ~]# cman_tool version -r
      To copy the configuration file to other nodes when the all nodes are already cluster members:
      [root@node4 ~]# ccs -h node1 --sync
      To copy the configuration file to other nodes and load the new configuration now. Use --check to see if
      activation succeeded or is needed:
      [root@node4 ~]# ccs -h node1 --sync --activate
      [root@node4 ~]# ccs -h node1 --check
      To start cluster services after manually confrming that cluster.conf is on all nodes and the same version. Do
      not combine this option with --sync --activate on the same command line invocation:
      [root@node4 ~]# ccs -h node1 --startall
      To stop all cluster services before attempting to re-distribute the configuration, use:
      [root@node4 ~]# ccs -h node1 --stopall

              wboessen Wander Boessenkool (Inactive)
              lauber Susan Lauber
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: