Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-6723

[RHOCP] 4.11 Node reboots due to new MC render without any changes

XMLWordPrintable

    • Important
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Multiple reboots are observed when MCO creates new render due to changes in kubeletconfig

      Version-Release number of selected component (if applicable):

      4.11

      How reproducible:

      N/A

      Steps to Reproduce:

      1. Create a cluster with older version than 4.11 with kubeletconfig
      2. Upgrade the cluster to 4.11
      3. MCO creates new renders frequently as changes are observed in kubeletconfig
      

      Actual results:

      There are frequent new MCP renders which cause node reboots

      Expected results:

      There are not reboots until valid changes are applied

      Additional info:

      These are the rendered config generated from Jan 3rd onward
      rendered-worker-799bf3a523c62fdd8a95785270dd31bc        60746a843e7ef8855ae00f2ffcb655c53e0e8296   3.2.0             20d -- due to 4.11.20 upgrade
      rendered-worker-c714a3d642eafd6e7f7a5b8d60551354        60746a843e7ef8855ae00f2ffcb655c53e0e8296   3.2.0             18d -- auto created (unexpected)
      rendered-worker-c91e9ce840d56bb14797c42566b486b5        92012a837e2ed0ed3c9e61c715579ac82ad0a464   3.2.0             6d16h -- due to upgrade to 4.11.22
      rendered-worker-d59194fbcdd6bb7523448854ab0dfc6a        92012a837e2ed0ed3c9e61c715579ac82ad0a464   3.2.0             6d16h -- auto created (unexpected)Changes between rendered-worker-799bf3a523c62fdd8a95785270dd31bc (3rd Jan) ---> rendered-worker-c714a3d642eafd6e7f7a5b8d60551354 (5th Jan) are below
      <   "nodeStatusUpdateFrequency": "0s",
      ---
      >   "nodeStatusUpdateFrequency": "10s",Changes between rendered-worker-c714a3d642eafd6e7f7a5b8d60551354 (5th Jan) ---> rendered-worker-c91e9ce840d56bb14797c42566b486b5 (17th during upgrade)
      <   "nodeStatusUpdateFrequency": "10s",
      ---
      >   "nodeStatusUpdateFrequency": "0s",
      
      <           --quiet ''quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6a461df0dad2d1e6ba8611a1d08985398460cf834b26a0e097ae847b8861569b'';
      ---
      >           --quiet ''quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a21cfea2dbabfc0fab02dae4d4419c60868afd0346bea729fa9b5f67ea84ee1d'';
      394c394
      <           --entrypoint=cp ''quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6a461df0dad2d1e6ba8611a1d08985398460cf834b26a0e097ae847b8861569b''
      ---
      >           --entrypoint=cp ''quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a21cfea2dbabfc0fab02dae4d4419c60868afd0346bea729fa9b5f67ea84ee1d''
      Changes between rendered-worker-c91e9ce840d56bb14797c42566b486b5 (17th during upgrade) --> rendered-worker-d59194fbcdd6bb7523448854ab0dfc6a (this is created on the same day (17th) after few seconds)
      <   "nodeStatusUpdateFrequency": "0s",
      ---
      >   "nodeStatusUpdateFrequency": "10s",Node reboot happened on below dates
      reboot   system boot  4.18.0-372.36.1. Wed Jan 18 20:31   still running - unexpected
      reboot   system boot  4.18.0-372.36.1. Tue Jan 17 22:11 - 20:31  (22:19) - due to upgrade 4.11.22
      reboot   system boot  4.18.0-372.36.1. Thu Jan 12 07:36 - 22:11 (5+14:34) - unexpected
      reboot   system boot  4.18.0-372.36.1. Thu Jan  5 16:07 - 07:36 (6+15:28) - unexpected
      reboot   system boot  4.18.0-372.36.1. Tue Jan  3 21:58 - 16:07 (1+18:08) - due to upgrade 4.11.20The unexpected reboots are happened due to machineconfig rollout's, the changes in the MC is only related to "nodeStatusUpdateFrequency"
      During the upgrade to 4.11.20, the "nodeStatusUpdateFrequency" set to "0" using the MC "rendered-worker-799bf3a523c62fdd8a95785270dd31bc". on 5th January a rendered machine config got auto created rendered-worker-c714a3d642eafd6e7f7a5b8d60551354 and it set "nodeStatusUpdateFrequency" set to "10". after 7 days on 12th January the MCO rolled back the current MC t the MC which is created on 3rd during the upgrade rendered-worker-799bf3a523c62fdd8a95785270dd31bc, so again "nodeStatusUpdateFrequency" set back to "0". Then the cu upgraded the cluster to 4.11.22 on 17th, due to which there are two rendered config generated within seconds of difference. The nodes are updated with latest MC rendered-worker-d59194fbcdd6bb7523448854ab0dfc6a and again "nodeStatusUpdateFrequency" set back to "0". On 18th January the MCO rolled back the MC to second last one ie, rendered-worker-c91e9ce840d56bb14797c42566b486b5 and currently the nodes are running with "nodeStatusUpdateFrequency" equal to "0".
             machineconfiguration.openshift.io/currentConfig: rendered-worker-c91e9ce840d56bb14797c42566b486b5
             machineconfiguration.openshift.io/desiredConfig: rendered-worker-c91e9ce840d56bb14797c42566b486b5

            team-mco Team MCO
            rhn-support-akanekar Ankita Kanekar
            Rio Liu Rio Liu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: