Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44747

Default Chrony.conf did not restore to its default after removing the custom chrony.conf via machine config

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.12.z
    • None
    • Important
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      The configuration files did not restore to its original state after removing the custom machine config.
      
      Customer configured custom NTP servers via Chrony.conf by using machine configs. Later customer removed the machine configs. When the customer removed the custom chrony machine configurations the worker and infra nodes were still using the same old custom chrony configuration and the backup file which should be present in the location /etc/machine-config-daemon/orig/etc/chrony.confg.mcdorig also got deleted. Also all the nodes are using the same latest and desired rendered machine configuration which does not have custom  chrony configuration. Only the master nodes chrony.conf restored to default and did not have any alerts.
      
      However the NodeClokNotSynchronising Alerts are still firing on the worker and infra nodes except the nodes which are created after the removal of the custom chrony machine configs.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Tried to reproduce the issue by applying the customer chrony configuration on a couple of test clusters with version 4.12.25 and 4.14.38 but were not successful.
      Applied the custom chrony NTP configuration for workers via machine configuration.MCO applied the custom chrony configuration under /etc/chrony.conf.MCO also backed up the default chrony configuration with file name chrony.conf.mcdorig under directory /etc/machine-config-daemon/orig/etc. Deleted the custom chrony machine configuration then the MCO restored the chrony.conf back to its default configuration and removed the previously backed up file.
      Tried above steps multiple times on both clusters and was not able to reproduce the same issue.
      
      Custom Chrony.conf used in machine config:
      server 172.19.1.110 iburst
      server 172.19.1.111 iburst
      driftfile /var/lib/chrony/drift
      makestep 1.0 3
      rtcsync
      logdir /var/log/chrony

      Steps to Reproduce:

      1. Apply the custom chrony NTP configuration for workers and masters via machine configuration.
      2. MCO rolls out the changes to chrony.conf file to all nodes.
      3. Delete the custom chrony machine config.
      4. The chrony.conf should restore to default (which did not happen in customer cluster with version 4.12.19 and using the same old custom config)
      
      NOTE: Unable to reproduce the issue in our test clusters 4.12.25 and 4.14.38. Couldn't be able to test this in Cluster version 4.12.19(Customer affected version) as this is longer available to install in ARO.
          

      Actual results:

      $cat /etc/chrony.conf (Masked IP address as per Micrsoft Policy)
      server 172.19.190.xx iburst
      server 172.19.190.xx iburst
      driftfile /var/lib/chrony/drift
      makestep 1.0 3
      rtcsync
      logdir /var/log/chrony
      
      $ chronyc sources -v
        .-- Source mode  '^' = server, '=' = peer, '#' = local clock. / .- Source state '*' = current best, '+' = combined, '-' = not combined,| /             'x' = may be in error, '~' = too variable, '?' = unusable.||                                                 .- xxxx [ yyyy ] +/- zzzz||      Reachability register (octal) -.           |  xxxx = adjusted offset,||      Log2(Polling interval) --.      |          |  yyyy = measured offset,||                                \     |          |  zzzz = estimated error.||                                 |    |           \MS Name/IP address         Stratum Poll Reach LastRx Last sample===============================================================================^? ig-ntp01.xxxcom       0  10     0     -     +0ns[   +0ns] +/-    0ns^? ig-ntp02.xxxx.com       0  10     0     -     +0ns[   +0ns] +/-    0ns

      Expected results:

      $cat /etc/chrony.conf
      # Use public servers from the pool.ntp.org project.# Please consider joining the pool (https://www.pool.ntp.org/join.html).pool 2.rhel.pool.ntp.org iburst
      # Use NTP servers from DHCP.sourcedir /run/chrony-dhcp
      # Record the rate at which the system clock gains/losses time.driftfile /var/lib/chrony/drift
      # Allow the system clock to be stepped in the first three updates# if its offset is larger than 1 second.makestep 1.0 3
      # Enable kernel synchronization of the real-time clock (RTC).rtcsync
      # Enable hardware timestamping on all interfaces that support it.#hwtimestamp *
      # Increase the minimum number of selectable sources required to adjust# the system clock.#minsources 2
      # Allow NTP client access from local network.#allow 192.168.0.0/16
      # Serve time even if not synchronized to a time source.#local stratum 10
      # Require authentication (nts or key option) for all NTP sources.#authselectmode require
      # Specify file containing keys for NTP authentication.keyfile /etc/chrony.keys
      # Save NTS keys and cookies.ntsdumpdir /var/lib/chrony
      # Insert/delete leap seconds by slewing instead of stepping.#leapsecmode slew
      # Get TAI-UTC offset and leap seconds from the system tz database.leapsectz right/UTC
      # Specify directory for log files.logdir /var/log/chrony
      # Select which information is logged.#log measurements statistics tracking
      
      
      # chronyc sources -v
        .-- Source mode  '^' = server, '=' = peer, '#' = local clock. / .- Source state '*' = current best, '+' = combined, '-' = not combined,| /             'x' = may be in error, '~' = too variable, '?' = unusable.||                                                 .- xxxx [ yyyy ] +/- zzzz||      Reachability register (octal) -.           |  xxxx = adjusted offset,||      Log2(Polling interval) --.      |          |  yyyy = measured offset,||                                \     |          |  zzzz = estimated error.||                                 |    |           \MS Name/IP address         Stratum Poll Reach LastRx Last sample===============================================================================#* PHC0                          0   3   377    10   +974ns[+1542ns] +/- 6133ns
      

      Additional info:

      Machine config with custom chrony.conf used to reproduce the issue:
      
      # Generated by Butane; do not edit
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machine.openshift.io/cluster-api-machine-role: worker
          machineconfiguration.openshift.io/role: worker
        name: worker-chrony-config
      spec:
        config:
          ignition:
            version: 3.2.0
          storage:
            files:
              - contents:
                  compression: gzip
                  source: data:;base64,H4sIAAAAAAAC/1zLUQ6CMAwG4PeeoifYqMZ4HhgFGyczfysJtzcx84XnL58rdgXL/ZIkSboNbNMHHnQG+cMMW2Kxqpz3EbnalMsDbTvyT+g1PtVD3yxp4Cshih9bodrW2dBPW/uhbwAAAP//69QNrYAAAAA=
                mode: 420
                overwrite: true
                path: /etc/chrony.conf

              team-mco Team MCO
              mhreddy.openshift Sudharsan Reddy M H
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: