Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43406

upgrade from 4.14 to 4.16 infra coredns static pod and rpm-ostree race condition

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 1
    • Moderate
    • None
    • Done
    • Bug Fix
    • Hide
      * Before this update, OCP updates that shipped a change to `coredns` templates would restart of the `coredns` pod before the image pull for the updated base operation system (OS) image. As a consequence, a race occurred where the `rpm-ostree`, which is the operating system update manager, failed the image pull because of network errors, causing the update to stall. With this release, a retry update operation is added to the the Machine Config Operator (MCO) to work around this race condition. https://issues.redhat.com/browse/OCPBUGS-43406[OCPBUGS-43406]
      Show
      * Before this update, OCP updates that shipped a change to `coredns` templates would restart of the `coredns` pod before the image pull for the updated base operation system (OS) image. As a consequence, a race occurred where the `rpm-ostree`, which is the operating system update manager, failed the image pull because of network errors, causing the update to stall. With this release, a retry update operation is added to the the Machine Config Operator (MCO) to work around this race condition. https://issues.redhat.com/browse/OCPBUGS-43406 [ OCPBUGS-43406 ]
    • None
    • None
    • None
    • None

      When the Machine config daemon is applying 4.16 manifests the coredns static pod yaml is getting upgraded however this causes the pod to redeploy and the rpm-ostree subsequently fails to perform DNS lookups—causing all upgrades to halt indefinitely. (requires IPI install on platform using the mcd templated coredns static pods)

      Intermittently when performing an upgrade to 4.16 on VMware infra.
      occurs in roughly %50 of nodes. The customer has experienced this issue on 10 clusters

      E1011 08:09:28.338992 29910 writer.go:226] Marking Degraded due to: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02e1321a6afc7edcfe476869816af39e598762ea125caf16fa5c1d3a536aac4e : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02e1321a6afc7edcfe476869816af39e598762ea125caf16fa5c1d3a536aac4e: error: Creating importer: Failed to invoke skopeo proxy method OpenImage: remote error: (Mirrors also failed: [xxx.xxx.xxx.xxx:5009/openshift-release-dev/ocp-v4.0-art-dev@sha256:02e1321a6afc7edcfe476869816af39e598762ea125caf16fa5c1d3a536aac4e: pinging container registry xxx.xxx.xxx.xxx:5009: Get "https://xxx.xxx.xxx.xxx:5009/v2/": dial tcp: lookup xxx.xxx.xxx.xxx on [::1]:53: read udp [::1]:46362->[::1]:53: read: connection refused]): quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02e1321a6afc7edcfe476869816af39e598762ea125caf16fa5c1d3a536aac4e: pinging contai...
      

              jerzhang@redhat.com Yu Qi Zhang
              rhn-support-tidawson Tim Dawson
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              14 Vote for this issue
              Watchers:
              24 Start watching this issue

                Created:
                Updated: