Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44987

OCP 4.14.31: SDN - OVN migration stalled, rendered machine-config failing to replace currentconfig during migration staging

XMLWordPrintable

    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      I am filing this bug in Networking/OVN but this may need to move to MCO team.

      During SDN to OVN migration on 4.14.31, Documentation steps were followed to move cluster into migration state, patched to define options for OVNkube-network. A new rendered-worker and rendered-master machine-config template was generated.  This rendered config triggered a rollout on all host nodes, all nodes restarted and then immediately reverted configuration states back to SDN config (previous machine-config-rendered builds)
      
      I1125 18:04:24.834012    1945 update.go:1987] Disk currentConfig "rendered-worker-60f99140fb406cb9f28b9c4f5da34a2d" overrides node's currentConfig annotation "rendered-worker-8e711d6ed7a3e84d1b309ef5f9f4476b"
      I1125 18:04:24.835987    1945 daemon.go:1841] Validating against current config rendered-worker-60f99140fb406cb9f28b9c4f5da34a2d
      
      As a result, migration has stalled/failed - unable to proceed.
      
      We have attempted to force a rollover to the latest build unsuccessfully using the below steps:
      
      node_name=<name>
      new_value=rendered-worker-8e711d6ed7a3e84d1b309ef5f9f4476b
      
      oc patch node $node_name  --type merge --patch "{\"metadata\": {\"annotations\": {\"machineconfiguration.openshift.io/desiredConfig\": \"${new_value}\"}}}"
      oc patch node $node_name  --type merge --patch '{"metadata": {"annotations": {"machineconfiguration.openshift.io/reason": ""}}}'
      oc patch node $node_name  --type merge --patch '{"metadata": {"annotations": {"machineconfiguration.openshift.io/state": "Done"}}}'
      
      #tested adding this step - no change, the file is rebuilt and then supersedes the selection of the annotation.
      oc debug node/$node_name -- chroot /host sh -c "mv /etc/machine-config-daemon/currentconfig /etc/machine-config-daemon/oldconfig"
      
      oc debug node/$node_name -- chroot /host sh -c "touch /run/machine-config-daemon-force"

       

      Version-Release number of selected component (if applicable):

      4.14.31

      How reproducible:

      every time on this customer cluster

      Steps to Reproduce:

      1. Cluster running 4.14.31 on Vsphere

      2. Attempt migration using below steps

      3. observe CNO reverts machine-config template

      Actual results:

       

      failed SDN migration

       

      Expected results:

      mcp rollout should provide alerts/warnings for rejected mcp rendered build and should also preferentially select newest/latest rendered config over previous builds.

      More data in the Jira comments below.

      Affected Platforms: vsphere

              pliurh Peng Liu
              rhn-support-wrussell Will Russell
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: