Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1557

Default to floating automaticRestart for new GCP instances

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • 4.12, 4.11, 4.10
    • None
    • Moderate
    • None
    • CLOUD Sprint 228
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause: Instances are not set to respect the GCP infrastructure default option for automated restarts.

      Consequence: Instances can be created without using the infrastructure default for automatic restarts. This can lead to scenarios where instances become terminated in GCP but their associated Machines are still listed in Running state because they did not automatically restart.

      Fix: The code for passing the automatic restart option has been improved to better detect, and pass on, the default option selection from users.

      Result: Instances now use the infrastructure default properly and are automatically restarted when the user requests the default functionality.
      Show
      Cause: Instances are not set to respect the GCP infrastructure default option for automated restarts. Consequence: Instances can be created without using the infrastructure default for automatic restarts. This can lead to scenarios where instances become terminated in GCP but their associated Machines are still listed in Running state because they did not automatically restart. Fix: The code for passing the automatic restart option has been improved to better detect, and pass on, the default option selection from users. Result: Instances now use the infrastructure default properly and are automatically restarted when the user requests the default functionality.
    • Bug Fix

      Seen in an instance created recently by a 4.12.0-ec.2 GCP provider:

        "scheduling": {
          "automaticRestart": false,
          "onHostMaintenance": "MIGRATE",
          "preemptible": false,
          "provisioningModel": "STANDARD"
        },
      

      From GCP's docs, they may stop instances on hardware failures and other causes, and we'd need automaticRestart: true to auto-recover from that. Also from GCP docs, the default for automaticRestart is true. And on the Go provider side, we doc:

      If omitted, the platform chooses a default, which is subject to change over time, currently that default is "Always".

      But the implementing code does not actually float the setting. Seems like a regression here, which is part of 4.10:

      $ git clone https://github.com/openshift/machine-api-provider-gcp.git
      $ cd machine-api-provider-gcp
      $ git log --oneline origin/release-4.10 | grep 'migrate to openshift/api'
      44f0f958 migrate to openshift/api
      

      But that's not where the 4.9 and earlier code is located:

      $ git branch -a | grep origin/release
        remotes/origin/release-4.10
        remotes/origin/release-4.11
        remotes/origin/release-4.12
        remotes/origin/release-4.13
      

      Hunting for 4.9 code:

      $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.9.48-x86_64 | grep gcp
        gcp-machine-controllers                        https://github.com/openshift/cluster-api-provider-gcp                       c955c03b2d05e3b8eb0d39d5b4927128e6d1c6c6
        gcp-pd-csi-driver                              https://github.com/openshift/gcp-pd-csi-driver                              48d49f7f9ef96a7a42a789e3304ead53f266f475
        gcp-pd-csi-driver-operator                     https://github.com/openshift/gcp-pd-csi-driver-operator                     d8a891de5ae9cf552d7d012ebe61c2abd395386e
      

      So looking there:

      $ git clone https://github.com/openshift/cluster-api-provider-gcp.git
      $ cd cluster-api-provider-gcp
      $ git log --oneline | grep 'migrate to openshift/api'
      ...no hits...
      $ git grep -i automaticRestart origin/release-4.9  | grep -v '"description"\|compute-gen.go'
      origin/release-4.9:vendor/google.golang.org/api/compute/v1/compute-api.json:        "automaticRestart": {
      

      Not actually clear to me how that code is structured. So 4.10 and later GCP machine-API providers are impacted, and I'm unclear on 4.9 and earlier.

            mimccune@redhat.com Michael McCune
            trking W. Trevor King
            Zhaohua Sun Zhaohua Sun
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: