Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-65566

[CustomDNS] Installer doesn't exit when failed to update API and API-Int Load Balancer IPs to infrastructure manifest

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.21
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • Installer Sprint 279, Installer Sprint 280
    • 2
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      In prow ipi-install-install step[1], function inject_promtail_service will be performed and update bootstrap ignition file on azure platform.
      
      After this update, bootstrap ignition data is base64 encoded and gzip compressed.
      -------------
      {"ignition":{"config":{"merge":[{"compression":"gzip","source":"data:;base64,H4sIAAAAAAAC/9S925eiyPYn
      ...
      
      When creating cluster with userProvisionedDNS enabled, installer edited bootstrap ignition file, but failed to update API and API-Int Load Balancer IPs to infrastructure manifest, I downloaded the bootstrap ignition file and checked the content of /opt/openshift/manifests/cluster-infrastructure-02-config.yml, there is no api and api-int IP information.
      -------------
      apiVersion: config.openshift.io/v1
      kind: Infrastructure
      metadata:
        creationTimestamp: null
        name: cluster
      spec:
        cloudConfig:
          key: config
          name: cloud-provider-config
        platformSpec:
          type: Azure
      status:
        apiServerInternalURI: https://api-int.ci-op-grjnv3rb-92d0c.qe1.devcluster.openshift.com:6443
        apiServerURL: https://api.ci-op-grjnv3rb-92d0c.qe1.devcluster.openshift.com:6443
        controlPlaneTopology: HighlyAvailable
        cpuPartitioning: None
        etcdDiscoveryDomain: ""
        infrastructureName: ci-op-grjnv3rb-92d0c-z49rb
        infrastructureTopology: HighlyAvailable
        platform: Azure
        platformStatus:
          azure:
            cloudLoadBalancerConfig:
              dnsType: ClusterHosted
            cloudName: AzurePublicCloud
            networkResourceGroupName: ci-op-grjnv3rb-92d0c-z49rb-rg
            resourceGroupName: ci-op-grjnv3rb-92d0c-z49rb-rg
            resourceTags:
            - key: expiration_date
              value: 2025-11-13T11:09+00:00
          type: Azure
      
      However, installer continued and didn't exit with error, see
      .openshift-install.log, and failed finally during waiting for machine provision because static pod coredns could not be up in bootstrap host.
      -------------
      time="2025-11-13T03:33:53Z" level=debug msg="BlobIgnitionContainer.ID=/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/ci-op-grjnv3rb-92d0c-z49rb-rg/providers/Microsoft.Storage/storageAccounts/ciopgrjnv3rb92d0cz49rbsa/blobServices/default/containers/ignition"
      time="2025-11-13T03:33:53Z" level=debug msg="Azure: Editing Ignition files to start in-cluster DNS when UserProvisionedDNS is enabled"
      time="2025-11-13T03:33:53Z" level=debug msg="Azure: Editing Ignition files with API LB IP: 135.119.24.174 and API Int LB IP: 10.0.0.100"
      time="2025-11-13T03:33:53Z" level=debug msg="Successfully added API and API-Int Load Balancer IPs to infrastructure manifest"
      time="2025-11-13T03:33:53Z" level=debug msg="Successfully added API and API-Int Load Balancer IPs to lbConfig configmap"
      time="2025-11-13T03:33:53Z" level=debug msg="Successfully updated bootstrap machine-config-server certs"
      time="2025-11-13T03:33:53Z" level=debug msg="Successfully updated master pointer ignition with API LB IP"
      time="2025-11-13T03:33:53Z" level=debug msg="Successfully updated worker pointer ignition with API LB IP"
      time="2025-11-13T03:33:53Z" level=debug msg="Successfully updated master user data secret"
      time="2025-11-13T03:33:53Z" level=debug msg="Successfully updated worker user data secret"
      time="2025-11-13T03:33:53Z" level=debug msg="Successfully updated bootstrap ignition with updated manifests"
      
      [1] https://github.com/openshift/release/blob/master/ci-operator/step-registry/ipi/install/install/ipi-install-install-commands.sh#L785-L800  

      Version-Release number of selected component (if applicable):

          4.21 pre-merge test

      How reproducible:

          Always in prow because env OPENSHIFT_INSTALL_PROMTAIL_ON_BOOTSTRAP is set to true on azure by default.

      Steps to Reproduce:

          1. Install cluster by using ipi-install-install step in prow
          2.
          3.
          

      Actual results:

          Installation failed to provision control plane machines

      Expected results:

          For such decoded and compressed ignition file, installer either handles this file correctly and updates api/api-int IPs in infra manifest or exits with error when updating failed.

      Additional info:

      1. prow ci failed job: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/openshift_release/71237/rehearse-71237-periodic-ci-openshift-verification-tests-main-installer-rehearse-4.21-installer-rehearse-azure/1988804477716533248
      
      2. Installation with userProvisionedDNS enabled is successful when not running function inject_promtail_service

              sdasu@redhat.com Sandhya Dasu
              jinyunma Jinyun Ma
              None
              None
              Jinyun Ma Jinyun Ma
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: