Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31478

Urgent Agent installer issue

    XMLWordPrintable

Details

    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

       Customer is seeing weird intermittent network issues with some of their nodes when installing with the agent installer.

      Version-Release number of selected component (if applicable):

       OCP 4.15

      How reproducible:

          Customer is seeing weird intermittent network issues with some of their nodes when installing with the agent installer.

      Steps to Reproduce:

      Engineering had the customer complete the following steps to make the install successful. 
      
      This was done because the customer need to be able to define the machine network with an an API vip outside of their machine network and IP ingress vip outside of their machine network.
      Please note that the agent installer and assistant installer team work to officially patch these work around in the installer so some of these steps will not be needed at a later time.
      Edit the ignite file by hand before creating the ISO image that will be used for installation. This eliminates the need for you to ssh into the node and bounce any daemons that are associated with the installation process. Please note we have not officially tested this but wanted to provide it to you early in an effort to keep everything moving forward.
      
      coreos-installer iso ignition show agent.x86_64.iso >agent.ign
      vi agent.ign
      coreos-installer iso ignition embed -f -i agent.ign -o agent-custom.x86_64.iso agent.x86_64.iso
      
      Set a supernet instead of multiple machinenetworks. Ford used 19.0.0.0/8.
      
      Disable steps:
      
      DISABLED_STEPS=domain-resolution
      DISABLED_HOST_VALIDATIONS=dns-wildcard-not-configured,api-domain-name-resolved-correctly,api-int-domain-name-resolved-correctly,apps-domain-name-resolved-correctly,belongs-to-majority-group
      
      - During the install a few hosts failed to register. Rebooting these hosts resolved the issue. 
      - After the cluster came up, there was a DNS error in the authentication operator. Removing those pods and allowing them to be recreated fixed it. They have apparently seen this occasionally before.
      - After the auth operator was working the CVO didn't recognize it promptly, so they ended up restarting that pod as well.
      - Installation completed successfully. This was with 4.15.0 - apparently they haven't pinned the version and they just get the latest.

      Actual results:

      Cluster is not created 

      Expected results:

      Cluster is created    

      Additional info:

        Work around and steps can be found in case.  03721522

      Attachments

        Activity

          People

            Unassigned Unassigned
            rhn-support-bsmitley Brandon Smitley
            Manoj Hans Manoj Hans
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: