Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60239

upgrade from 4.14 to 4.16 infra coredns static pod and rpm-ostree race condition

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 1
    • Moderate
    • None
    • None
    • None
    • None
    • In Progress
    • Bug Fix
    • Hide
      * Before this update, OCP updates that shipped a change to `coredns` templates would restart the static pod pre-reboot of the node. This occurred before the image pull for the base operation system image update. As a consequence, a race occurred where the `rpm-ostree`, the operating system (OS) update manager, failed the image pull because of network errors and stall. With this release, a retry in the Machine Config Operator (MCO) OS update operation is added to work around the race condition due to the restarts of the `coredns` pod.
      Show
      * Before this update, OCP updates that shipped a change to `coredns` templates would restart the static pod pre-reboot of the node. This occurred before the image pull for the base operation system image update. As a consequence, a race occurred where the `rpm-ostree`, the operating system (OS) update manager, failed the image pull because of network errors and stall. With this release, a retry in the Machine Config Operator (MCO) OS update operation is added to work around the race condition due to the restarts of the `coredns` pod.
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-60034. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-59899. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-43406. The following is the description of the original issue:

      When the Machine config daemon is applying 4.16 manifests the coredns static pod yaml is getting upgraded however this causes the pod to redeploy and the rpm-ostree subsequently fails to perform DNS lookups—causing all upgrades to halt indefinitely. (requires IPI install on platform using the mcd templated coredns static pods)

      Intermittently when performing an upgrade to 4.16 on VMware infra.
      occurs in roughly %50 of nodes. The customer has experienced this issue on 10 clusters

      E1011 08:09:28.338992 29910 writer.go:226] Marking Degraded due to: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02e1321a6afc7edcfe476869816af39e598762ea125caf16fa5c1d3a536aac4e : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02e1321a6afc7edcfe476869816af39e598762ea125caf16fa5c1d3a536aac4e: error: Creating importer: Failed to invoke skopeo proxy method OpenImage: remote error: (Mirrors also failed: [xxx.xxx.xxx.xxx:5009/openshift-release-dev/ocp-v4.0-art-dev@sha256:02e1321a6afc7edcfe476869816af39e598762ea125caf16fa5c1d3a536aac4e: pinging container registry xxx.xxx.xxx.xxx:5009: Get "https://xxx.xxx.xxx.xxx:5009/v2/": dial tcp: lookup xxx.xxx.xxx.xxx on [::1]:53: read udp [::1]:46362->[::1]:53: read: connection refused]): quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02e1321a6afc7edcfe476869816af39e598762ea125caf16fa5c1d3a536aac4e: pinging contai...
      

              jerzhang@redhat.com Yu Qi Zhang
              rhn-support-tidawson Tim Dawson
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: