-
Bug
-
Resolution: Done-Errata
-
Major
-
4.12
-
Important
-
No
-
CLOUD Sprint 244
-
1
-
Rejected
-
False
-
Description of problem:
As part of deployment of our product (Submariner), we are creating a new worker node for the cluster by using MachineSet. Recently, creation of a new worker node started to fail in a cluster version 4.12 on Azure cloud platform.
Version-Release number of selected component (if applicable):
Openshift 4.12
How reproducible:
Deploy Openshift 4.12 and create a new workder node by using MachineSet manifest.
Steps to Reproduce:
1. Deploy Openshift version 4.12 on Azure cloud platform 2. Create a new worker node by using MachineSet 3.
Actual results:
The worker node should be created.
Expected results:
Creation of the worker node fails with the following errors from machine-controller container in machine-api-controllers pod in openshift-machine-api namespace: I0921 10:18:36.589819 1 actuator.go:213] subgw-central-46fe92-xrwql: actuator checking if machine existsW0921 10:18:36.731223 1 virtualmachines.go:100] vm subgw-central-46fe92-xrwql not found: %!w(string=compute.VirtualMachinesClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Compute/virtualMachines/subgw-central-46fe92-xrwql' under resource group 'mbabushk-azure-vvz4j-rg' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix")I0921 10:18:36.731266 1 controller.go:380] subgw-central-46fe92-xrwql: reconciling machine triggers idempotent createI0921 10:18:36.731277 1 actuator.go:85] Creating machine subgw-central-46fe92-xrwqlI0921 10:18:36.731825 1 publicips.go:58] creating public ip -subgw-central-46fe92-xrwqlI0921 10:18:37.286484 1 machine_scope.go:196] subgw-central-46fe92-xrwql: patching machineE0921 10:18:37.323111 1 actuator.go:79] Machine error: failed to reconcile machine "subgw-central-46fe92-xrwql": network.PublicIPAddressesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidDomainNameLabel" Message="The domain name label -subgw-central-46fe92-xrwql is invalid. It must conform to the following regular expression: ^[a-z][a-z0-9-]{1,61}[a-z0-9]$." Details=[]W0921 10:18:37.323146 1 controller.go:382] subgw-central-46fe92-xrwql: failed to create machine: failed to reconcile machine "subgw-central-46fe92-xrwql": network.PublicIPAddressesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidDomainNameLabel" Message="The domain name label -subgw-central-46fe92-xrwql is invalid. It must conform to the following regular expression: ^[a-z][a-z0-9-]{1,61}[a-z0-9]$." Details=[]I0921 10:18:37.323159 1 controller.go:422] Actuator returned invalid configuration error: failed to reconcile machine "subgw-central-46fe92-xrwql": network.PublicIPAddressesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidDomainNameLabel" Message="The domain name label -subgw-central-46fe92-xrwql is invalid. It must conform to the following regular expression: ^[a-z][a-z0-9-]{1,61}[a-z0-9]$." Details=[]I0921 10:18:37.323169 1 controller.go:435] subgw-central-46fe92-xrwql: going into phase "Failed"I0921 10:18:37.324155 1 recorder.go:103] events "msg"="InvalidConfiguration: failed to reconcile machine \"subgw-central-46fe92-xrwql\": network.PublicIPAddressesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code=\"InvalidDomainNameLabel\" Message=\"The domain name label -subgw-central-46fe92-xrwql is invalid. It must conform to the following regular expression: ^[a-z][a-z0-9-]{1,61}[a-z0-9]$.\" Details=[]" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"subgw-central-46fe92-xrwql","uid":"97218e2e-fc42-48c3-b834-dafc97fd2396","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"112872"} "reason"="FailedCreate" "type"="Warning"
Additional info:
Since I didn't find a way to attach log files, I'm providing a link to a gdrive folder with the logs. The following logs being attached: - Cluster must gather - Machine-controller and machineset-controller containers logs - Applied Machine and MachineSet manifests https://drive.google.com/drive/folders/1Xupus1hQC-CCtsTxh7R47RkOiiUgObd9?usp=sharing
- blocks
-
ACM-7491 [OCP 4.12] Submariner 0.16.0 - Creation of gateway node in Azure failed
- Closed
- depends on
-
OCPBUGS-7696 [ Azure ]not able to deploy machine with publicIp:true
- Closed
- is depended on by
-
ACM-7491 [OCP 4.12] Submariner 0.16.0 - Creation of gateway node in Azure failed
- Closed
- links to
-
RHBA-2023:6126 OpenShift Container Platform 4.12.z bug fix update