Uploaded image for project: 'OpenShift Specialist Platform Team'
  1. OpenShift Specialist Platform Team
  2. SPLAT-187

Docs: Expose `arm01` build farm API and Console with LetsEncrypt certs

    • Icon: Story Story
    • Resolution: Done
    • Icon: Minor Minor
    • None
    • None
    • False
    • False
    • Undefined

      As an OpenShift Prow build farm maintainer,

      I'd like the API and Web Console to be exposed with Let's Encrypt certs.

      The certs have already been created.

      When configuring the API server to use the LE cert (per https://docs.openshift.com/container-platform/4.7/security/certificates/api-server.html), Prow broke:

      $ oc patch apiserver cluster --type=merge --patch \
          '{"spec":{"servingCerts":{"namedCertificates":[{"names":["api.arm-build01.arm-build.devcluster.openshift.com"],"servingCertificate":{"name":"apiserver-arm01-tls"}}]}}}'
      

      Patching the console config to use the cert (per https://docs.openshift.com/container-platform/4.7/web_console/customizing-the-web-console.html#customizing-the-web-console-url_customizing-web-console) only seems to have worked partly:

      $ oc patch consoles.operator.openshift.io cluster --namespace=openshift-console --type=merge --patch \
          '{"spec":{"route":{"hostname":"console.arm-build01.arm-build.devcluster.openshift.com","secret":{"name":"console-arm01-tls"}}}}'
      

      It's accessible on the public IP and works as expected there, but the console operator goes degraded with trying to access it on the private network. One symptom of this is that pod logs can't be accessed through the web console:

      RouteHealthDegraded: failed to GET route (https://console.arm-build01.arm-build.devcluster.openshift.com/health): Get "https://console.arm-build01.arm-build.devcluster.openshift.com/health": dial tcp: lookup console.arm-build01.arm-build.devcluster.openshift.com on 172.30.0.10:53: no such host
      

      Also, dns-default pods started crashlooping, which might be related.

      Reverting both changes has fixed the issues at hand for now.

      Investigate whether this is really absolutely required, why the breakages happened in the first place, and whether the Prow failure regarding API access to that cluster might have been caused by delays in propagation of the config (un-doing the apiserver change did fix it, but not immediately)

      This is currently not a blocker, and won't be as long a no jobs running on `arm01` are not blocking the release repo (perĀ https://coreos.slack.com/archives/CBN38N3MW/p1620403237189900?thread_ts=1620402553.187000&cid=CBN38N3MW)

              cglombek Christian Glombek
              cglombek Christian Glombek
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: