Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3909

Node is degraded when a machine config deploys a unit with content and mask=true

XMLWordPrintable

      Description of problem:

      When we create a MC that deploys a unit, and this unit has a content and the value mask=true, then the node becomes degraded because of a driftconfig error like this one:
      
      E1118 16:41:42.485314    1900 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-e701d8c471184e3a66756b26b4b7dd33: mode mismatch for file: "/etc/systemd/system/maks-and-contents.service"; expected: -rw-r--r--/420/0644; received: Lrwxrwxrwx/134218239/01000000777
      

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-2022-11-19-191518

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create this machine config resource
      
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: worker
        name: mask-and-content
      spec:
        config:
          ignition:
            version: 3.2.0
          systemd:
            units:
            - name: maks-and-contents.service
              mask: true
              contents: |
                [Unit]
                Description=Just random content
      
      

      Actual results:

      The worker MCP becomes degraded, and this error is reported in the MCD:
      
      E1118 16:41:42.485314    1900 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-e701d8c471184e3a66756b26b4b7dd33: mode mismatch for file: "/etc/systemd/system/maks-and-contents.service"; expected: -rw-r--r--/420/0644; received: Lrwxrwxrwx/134218239/01000000777
       

      Expected results:

      Until config drift functionality was added, if a unit was masked, then the content was ignored.
      
      If what happens is that this configuration is not allowed, the error message should report a more descriptive message.
       

      Additional info:

      It is not enough to restore the desiredConfig value in the degraded nodes. These are the steps to recover the node:
      
      1. Edit the node's annotations and make  desiredConfig = currentConfig
      2. Remove file /etc/machine-config-daemon/currentconfig  in the node
      3. Flush the journal in the node. 
      $ journalctl --rotate; journalctl --vacuum-time=1s
      
      4. create the force file in the node
      $ touch /run/machine-config-daemon-force
      
       

       

              zzlotnik@redhat.com Zack Zlotnik
              sregidor@redhat.com Sergio Regidor de la Rosa
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: