Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19549

OCP 4.12 fails to create MachineSet on Azure platform

    XMLWordPrintable

Details

    • Important
    • CLOUD Sprint 244
    • 1
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      As part of deployment of our product (Submariner), we are creating a new worker node for the cluster by using MachineSet.
      
      Recently, creation of a new worker node started to fail in a cluster version 4.12 on Azure cloud platform.

      Version-Release number of selected component (if applicable):

      Openshift 4.12

      How reproducible:

      Deploy Openshift 4.12 and create a new workder node by using MachineSet manifest.

      Steps to Reproduce:

      1. Deploy Openshift version 4.12 on Azure cloud platform
      2. Create a new worker node by using MachineSet
      3.
      

      Actual results:

      The worker node should be created.

      Expected results:

      Creation of the worker node fails with the following errors from machine-controller container in machine-api-controllers pod in openshift-machine-api namespace:
      
      I0921 10:18:36.589819       1 actuator.go:213] subgw-central-46fe92-xrwql: actuator checking if machine existsW0921 10:18:36.731223       1 virtualmachines.go:100] vm subgw-central-46fe92-xrwql not found: %!w(string=compute.VirtualMachinesClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Compute/virtualMachines/subgw-central-46fe92-xrwql' under resource group 'mbabushk-azure-vvz4j-rg' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix")I0921 10:18:36.731266       1 controller.go:380] subgw-central-46fe92-xrwql: reconciling machine triggers idempotent createI0921 10:18:36.731277       1 actuator.go:85] Creating machine subgw-central-46fe92-xrwqlI0921 10:18:36.731825       1 publicips.go:58] creating public ip -subgw-central-46fe92-xrwqlI0921 10:18:37.286484       1 machine_scope.go:196] subgw-central-46fe92-xrwql: patching machineE0921 10:18:37.323111       1 actuator.go:79] Machine error: failed to reconcile machine "subgw-central-46fe92-xrwql": network.PublicIPAddressesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidDomainNameLabel" Message="The domain name label -subgw-central-46fe92-xrwql is invalid. It must conform to the following regular expression: ^[a-z][a-z0-9-]{1,61}[a-z0-9]$." Details=[]W0921 10:18:37.323146       1 controller.go:382] subgw-central-46fe92-xrwql: failed to create machine: failed to reconcile machine "subgw-central-46fe92-xrwql": network.PublicIPAddressesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidDomainNameLabel" Message="The domain name label -subgw-central-46fe92-xrwql is invalid. It must conform to the following regular expression: ^[a-z][a-z0-9-]{1,61}[a-z0-9]$." Details=[]I0921 10:18:37.323159       1 controller.go:422] Actuator returned invalid configuration error: failed to reconcile machine "subgw-central-46fe92-xrwql": network.PublicIPAddressesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidDomainNameLabel" Message="The domain name label -subgw-central-46fe92-xrwql is invalid. It must conform to the following regular expression: ^[a-z][a-z0-9-]{1,61}[a-z0-9]$." Details=[]I0921 10:18:37.323169       1 controller.go:435] subgw-central-46fe92-xrwql: going into phase "Failed"I0921 10:18:37.324155       1 recorder.go:103] events "msg"="InvalidConfiguration: failed to reconcile machine \"subgw-central-46fe92-xrwql\": network.PublicIPAddressesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code=\"InvalidDomainNameLabel\" Message=\"The domain name label -subgw-central-46fe92-xrwql is invalid. It must conform to the following regular expression: ^[a-z][a-z0-9-]{1,61}[a-z0-9]$.\" Details=[]" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"subgw-central-46fe92-xrwql","uid":"97218e2e-fc42-48c3-b834-dafc97fd2396","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"112872"} "reason"="FailedCreate" "type"="Warning"

      Additional info:

      Since I didn't find a way to attach log files, I'm providing a link to a gdrive folder with the logs.
      
      The following logs being attached:
      - Cluster must gather
      - Machine-controller and machineset-controller containers logs
      - Applied Machine and MachineSet manifests
      
      https://drive.google.com/drive/folders/1Xupus1hQC-CCtsTxh7R47RkOiiUgObd9?usp=sharing
      
      

      Attachments

        Issue Links

          Activity

            People

              joelspeed Joel Speed
              mbabushk@redhat.com Maxim Babushkin
              Zhaohua Sun Zhaohua Sun
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: