Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35818

[4.16] dev-scripts fails bootstrapping OCP 4.16 and greater with MIRROR_IMAGES=true AND INSTALLER_PROXY=true

    • No
    • Metal Platform 255, Metal Platform 256
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, a regression in 4.16.0 caused new baremetal installer-provisioned infrastructure (IPI) installations to fail when proxies were used. This was caused by one of the services in the bootstrap virtual machine (VM) trying to access IP address 0.0.0.0 through the proxy. With this release, this service no longer accesses 0.0.0.0. (link:https://issues.redhat.com/browse/OCPBUGS-35818[*OCPBUGS-35818*])
      ______________
      A regression in 4.16.0 caused new baremetal IPI installations to fail when a proxy are used. This was caused by one of the services in the bootstrap VM trying to access IP address 0.0.0.0 through the proxy. Now this service no longer accesses 0.0.0.0.
      Show
      * Previously, a regression in 4.16.0 caused new baremetal installer-provisioned infrastructure (IPI) installations to fail when proxies were used. This was caused by one of the services in the bootstrap virtual machine (VM) trying to access IP address 0.0.0.0 through the proxy. With this release, this service no longer accesses 0.0.0.0. (link: https://issues.redhat.com/browse/OCPBUGS-35818 [* OCPBUGS-35818 *]) ______________ A regression in 4.16.0 caused new baremetal IPI installations to fail when a proxy are used. This was caused by one of the services in the bootstrap VM trying to access IP address 0.0.0.0 through the proxy. Now this service no longer accesses 0.0.0.0.
    • Bug Fix
    • In Progress

      Description of problem:

          Trying to execute https://github.com/openshift-metal3/dev-scripts to deploy an OCP 4.16 or 4.17 cluster (with the same configuration OCP 4.14 and 4.15 are instead working) with:
       MIRROR_IMAGES=true
       INSTALLER_PROXY=true
      
      the bootstrap process fails with:
      
       level=debug msg=    baremetalhost resource not yet available, will retry
      level=debug msg=    baremetalhost resource not yet available, will retry
      level=info msg=  baremetalhost: ostest-master-0: uninitialized
      level=info msg=  baremetalhost: ostest-master-0: registering
      level=info msg=  baremetalhost: ostest-master-1: uninitialized
      level=info msg=  baremetalhost: ostest-master-1: registering
      level=info msg=  baremetalhost: ostest-master-2: uninitialized
      level=info msg=  baremetalhost: ostest-master-2: registering
      level=info msg=  baremetalhost: ostest-master-1: inspecting
      level=info msg=  baremetalhost: ostest-master-2: inspecting
      level=info msg=  baremetalhost: ostest-master-0: inspecting
      E0514 12:16:51.985417   89709 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?allowWatchBookmarks=true&resourceVersion=5466&timeoutSeconds=547&watch=true": Service Unavailable
      W0514 12:16:52.979254   89709 reflector.go:539] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?resourceVersion=5466": Service Unavailable
      E0514 12:16:52.979293   89709 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?resourceVersion=5466": Service Unavailable
      E0514 12:37:01.927140   89709 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?allowWatchBookmarks=true&resourceVersion=7800&timeoutSeconds=383&watch=true": Service Unavailable
      W0514 12:37:03.173425   89709 reflector.go:539] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?resourceVersion=7800": Service Unavailable
      E0514 12:37:03.173473   89709 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?resourceVersion=7800": Service Unavailable
      level=debug msg=Fetching Bootstrap SSH Key Pair...
      level=debug msg=Loading Bootstrap SSH Key Pair...
      
      it looks like up to a certain point https://api.ostest.test.metalkube.org:6443 was reachable but then for some reason it started failing because its not using the proxy or is and it shouldn't be (???)
      
      The 3 master nodes are reported as:
      [root@ipi-ci-op-0qigcrln-b54ee-1790684582253694976 home]# oc get baremetalhosts -A
      NAMESPACE               NAME              STATE        CONSUMER                ONLINE   ERROR              AGE
      openshift-machine-api   ostest-master-0   inspecting   ostest-bbhxb-master-0   true     inspection error   24m
      openshift-machine-api   ostest-master-1   inspecting   ostest-bbhxb-master-1   true     inspection error   24m
      openshift-machine-api   ostest-master-2   inspecting   ostest-bbhxb-master-2   true     inspection error   24m
      
      With something like:
      
       status:
        errorCount: 5
        errorMessage: 'Failed to inspect hardware. Reason: unable to start inspection: Validation
          of image href http://0.0.0.0:8084/34427934-f1a6-48d6-9666-66872eec9ba2 failed,
          reason: Got HTTP code 503 instead of 200 in response to HEAD request.'
        errorType: inspection error
      
      on their status

      Version-Release number of selected component (if applicable):

          4.16, 4.17

      How reproducible:

          100%

      Steps to Reproduce:

          1. Try to create an OCP 4.16 cluster with dev-scrips with IP_STACK=v4, MIRROR_IMAGES=true and INSTALLER_PROXY=true
          2.
          3.
          

      Actual results:

          level=info msg=  baremetalhost: ostest-master-0: inspecting
      E0514 12:16:51.985417   89709 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?allowWatchBookmarks=true&resourceVersion=5466&timeoutSeconds=547&watch=true": Service Unavailable

      Expected results:

          Successful deployment

      Additional info:

      I'm using IP_STACK=v4, MIRROR_IMAGES=true and INSTALLER_PROXY=true
      with the same configuration (MIRROR_IMAGES=true and INSTALLER_PROXY=true) OCP 4.14 and OCP 4.15 are working.
      
      When removing INSTALLER_PROXY=true, OCP 4.16 is also working.
      
      I'm going to attach bootstrap gather logs

            [OCPBUGS-35818] [4.16] dev-scripts fails bootstrapping OCP 4.16 and greater with MIRROR_IMAGES=true AND INSTALLER_PROXY=true

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Moderate: OpenShift Container Platform 4.16.2 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:4316

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Moderate: OpenShift Container Platform 4.16.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4316

            OK, I can't understand the mechanism for that, but we're still not going to block the release

            Zane Bitter added a comment - OK, I can't understand the mechanism for that, but we're still not going to block the release

            Hey zabitter
            just want to update that the noProxy workaround is only working for ipv4 installations. We cannot use it for ipv6, as the deployments keep failing on the same inspection error. 
            rh-ee-ddmitrie 

            Elai Shalev added a comment - Hey zabitter ,  just want to update that the noProxy workaround is only working for ipv4 installations. We cannot use it for ipv6, as the deployments keep failing on the same inspection error.  rh-ee-ddmitrie  

            Zane Bitter added a comment -

            Zane Bitter added a comment - vkolodny@redhat.com documentation is sparse, but I don't see any evidence of that: https://cs.opensource.google/go/x/net/+/refs/tags/v0.26.0:http/httpproxy/proxy.go;l=38-50

            Zane Bitter added a comment -

            Release blocker was rejected because we believe there is a simple workaround.

            Zane Bitter added a comment - Release blocker was rejected because we believe there is a simple workaround.

            Hey zabitter
            I believe I successfully installed a ipv4 cluster with proxy using the "noProxy: "0.0.0.0" workaround. 

            for the job:
            https://auto-jenkins-csb-kniqe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-baremetal-ipi-deployment/30671/consoleFull

            Elai Shalev added a comment - Hey zabitter ,  I believe I successfully installed a ipv4 cluster with proxy using the "noProxy: "0.0.0.0" workaround.  for the job: https://auto-jenkins-csb-kniqe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-baremetal-ipi-deployment/30671/consoleFull

            Steeve Goveas added a comment - Premerge Tested successfully https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-baremetalds-ipi-ovn-dualstack-primaryv6-f7/1803385073857204224 https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-baremetalds-ipi-ovn-ipv6-fips-f14/1803384994194788352

            I suspect that this can be worked around by setting:

            proxy:
              noProxy: "0.0.0.0"

            in the install-config.

            If this proves to be the case then this would not be a release blocker.

            Zane Bitter added a comment - I suspect that this can be worked around by setting: proxy: noProxy: "0.0.0.0" in the install-config. If this proves to be the case then this would not be a release blocker.

              rhn-engineering-hpokorny Honza Pokorny
              stirabos Simone Tiraboschi
              Jad Haj Yahya Jad Haj Yahya
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: