Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35863

SSH to the nodes is failing after RHOCP cluster upgrade 4.12.31 -> 4.13.41 -> 4.14.25

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.14.z
    • RHCOS
    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      SSH to the OCP nodes is failing with below error after upgrading RHOCP cluster from version 4.12.31 -> 4.13.41 -> 4.14.25
      ~~~
      [root@MBIAZPRDOCPBN01 ~]# ssh core@10.129.51.119
      core@10.129.51.119: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
      ~~~

      How reproducible:

      100%

      Steps to Reproduce:

      1. Create OCP 4.12 cluster
      2. Check if SSH to the OCP node is possible
      3. Create 50-sshd-crypto-worker and 50-sshd-crypto-master MachineConfigs for the SSH server to use specific Crypto configuration
      -  Refer KCS for the same : https://access.redhat.com/solutions/5138901
      4. Check if SSH to the OCP node is possible
      5. Upgrade OCP cluster to 4.13
      6. Check if SSH to the OCP node is possible
      7. Upgrade OCP cluster to 4.14
      8. Check if SSH to the OCP node is possible

      Actual results:

      Customer is unable to SSH the nodes until they deleted the 50-sshd-crypto-worker and 50-sshd-crypto-master MachineConfigs and adding the line "Include /etc/ssh/sshd_config.d/*.conf" in file /etc/ssh/sshd_config

      Expected results:

      SSH to the nodes should work , when customization done to the sshd was removed

      Additional info:

      - After upgrading the OCP cluster from 4.12 to 4.14 , it is expected the file /etc/ssh/sshd_config must include this line "Include /etc/ssh/sshd_config.d/*.conf"
      
      - Tried debugging into one of the worker node and added the expected line in /etc/ssh/sshd_config. SSH to the worker node was successful. - When we tried making the same changes in the existing MachineConfigs. It didn't help.
       
      - After deleting these 50-sshd-config-master and 50-sshd-config-worker MachineConfigs, observed this line "Include /etc/ssh/sshd_config.d/*.conf" is still not present in the file /etc/ssh/sshd_config. 
      - Created a new MachineConfig with adding the same line in file /etc/ssh/sshd_config. - SSH to the nodes working fine now. Issue is resolved.
      
      The ask from this BUG is to get RCA 
      - why SSH to the node was not working in customer's cluster when 50-sshd-config-master and 50-sshd-config-worker MachineConfigs were present in the cluster ?
      - why even after removing the customized MachineConfigs, the sshd configuration files were not updated to the original/default state amd the SSH to the nodes was still not working ?
      - why "Include /etc/ssh/sshd_config.d/*.conf" line is not present in file /etc/ssh/sshd_config ?

            Unassigned Unassigned
            rhn-support-sdharma Suruchi Dharma
            Michael Nguyen Michael Nguyen
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: