Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36593

Disconnected ARO clusters fail to add new nodes after upgrading to 4.14

XMLWordPrintable

    • No
    • MCO Sprint 256
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, a change of dependency targets was introduced in {product-title} 4.14 that prevented disconnected ARO installs from scaling up new nodes after they upgraded to affected versions. With this release, disconnected ARO installs can scale up new nodes after upgrading to {product-title} 4.14. (link:https://issues.redhat.com/browse/OCPBUGS-36593[*OCPBUGS-36593*])
      Show
      * Previously, a change of dependency targets was introduced in {product-title} 4.14 that prevented disconnected ARO installs from scaling up new nodes after they upgraded to affected versions. With this release, disconnected ARO installs can scale up new nodes after upgrading to {product-title} 4.14. (link: https://issues.redhat.com/browse/OCPBUGS-36593 [* OCPBUGS-36593 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-36550. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-36536. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-35300. The following is the description of the original issue:

      Description of problem:

      ARO cluster fails to install with disconnected networking.
      We see master nodes bootup hang on the service machine-config-daemon-pull.service. Logs from the service indicate it cannot reach the public IP of the image registry. In ARO, image registries need to go via a proxy. Dnsmasq is used to inject proxy DNS answers, but machine-config-daemon-pull is starting before ARO's dnsmasq.service starts.
      

      Version-Release number of selected component (if applicable):

      4.14.16
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      For Fresh Install:
      1. Create the required ARO vnet and subnets
      2. Attach a route table to the subnets with a blackhole route 0.0.0.0/0
      3. Create 4.14 ARO cluster with --apiserver-visibility=Private --ingress-visibility=Private --outbound-type=UserDefinedRouting
      
      [OR]
      
      Post Upgrade to 4.14:
      1. Create a ARO 4.13 UDR.
      2. ClusterUpgrade the cluster 4.13-> 4.14 , upgrade was successful
      3. Create a new node (scale up), we run into the same issue. 

      Actual results:

      For Fresh Install of 4.14:
      ERROR: (InternalServerError) Deployment failed.
      
      [OR]
      
      Post Upgrade to 4.14:
      Node doesn't come into a Ready State and Machine is stuck in Provisioned status.

      Expected results:

      Succeeded 

      Additional info:
      We see in the node logs that machine-config-daemon-pull.service is unable to reach the image registry. ARO's dnsmasq was not yet started.
      Previously, systemd ordering was set for ovs-configuration.service to start after (ARO's) dnsmasq.service. Perhaps that should have gone on machine-config-daemon-pull.service.
      See https://issues.redhat.com/browse/OCPBUGS-25406.

            team-mco Team MCO
            openshift-crt-jira-prow OpenShift Prow Bot
            Hilliary Lipsig Hilliary Lipsig
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: