Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-22453

OKD: ABI is broken for OKD/FCOS when disconnected registry is a subdomain of cluster domain

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required
    • In Progress

      When using a disconnected image registry which is hosted at a subdomain of the cluster domain, then Agent-based Installer fails to install a OKD/FCOS cluster. The rendezvous host starts bootkube.sh but fails because it cannot resolve the registry DNS name:

      Oct 25 12:47:03 master-0 bootkube.sh[6462]: error: unable to read image virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:76562238a20f2f4dd45770f00730e20425edd376d30d58d7dafb5d6f02b208c5: Get "https://virthost.ostest.test.metalkube.org:5000/v2/": dial tcp: lookup virthost.ostest.test.metalkube.org: no such host
      Oct 25 12:47:03 master-0 systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE
      Oct 25 12:47:03 master-0 systemd[1]: bootkube.service: Failed with result 'exit-code'.
      

      This hit OpenShift CI jobs 'okd-e2e-agent-compact-ipv4' and 'okd-e2e-agent-sno-ipv6' based on openshift-metal3/dev-scripts. An example would be a OCP cluster domain (which contains the cluster name) of `ostest.test.metalkube.org` and a disconnected image registry at `virthost.ostest.test.metalkube.org`.

      Other diagnosis from the rendezvous host:

      [core@master-0 ~]$ sudo podman pull virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:76562238a20f2f4dd45770f00730e20425edd376d30d58d7dafb5d6f02b208c5
      Trying to pull virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:76562238a20f2f4dd45770f00730e20425edd376d30d58d7dafb5d6f02b208c5...
      Error: initializing source docker://virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:76562238a20f2f4dd45770f00730e20425edd376d30d58d7dafb5d6f02b208c5: pinging container registry virthost.ostest.test.metalkube.org:5000: Get "https://virthost.ostest.test.metalkube.org:5000/v2/": dial tcp: lookup virthost.ostest.test.metalkube.org: no such host
      
      curl -u ocp-user:ocp-pass https://virthost.ostest.test.metalkube.org:5000/v2/_catalog 
      curl: (6) Could not resolve host: virthost.ostest.test.metalkube.org
      
      core@master-0 ~]$ dig +noall +answer virthost.ostest.test.metalkube.org
      ;; communications error to 127.0.0.1#53: connection refused
      ;; communications error to 127.0.0.1#53: connection refused
      ;; communications error to 127.0.0.1#53: connection refused
      virthost.ostest.test.metalkube.org. 0 IN A      192.168.111.1
      

      After stopping systemd-resolved:

      [core@master-0 ~]$ curl -u ocp-user:ocp-pass https://virthost.ostest.test.metalkube.org:5000/v2/_catalog 
      {"repositories":["localimages/installer","localimages/local-release-image"]}
      

      Report and diagnosis output above from afasano@redhat.com.

              jmeng@redhat.com Jakob Meng
              jmeng@redhat.com Jakob Meng
              Gaoyun Pei Gaoyun Pei
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: