Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1806

OCP cluster install on baremetal fails when hostname of master nodes does not include the text "master" (take 2)

    XMLWordPrintable

Details

    • Important
    • 1
    • Metal Platform 225, Metal Platform 226
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      The `cluster-baremetal-operator` now determines a media access control (MAC) address for a control plane node by using a control plane machine with a defined `master` role.

      Before the {product-title} {product-version} release, the `cluster-baremetal-operator` searched for a `BareMetalHost` object for a defined `master` role. If the object did not define a `master` role, the `Matel3` pod would not start.

      (link:https://issues.redhat.com/browse/OCPBUGS-1806[*OCPBUGS-1806*])
      Show
      The `cluster-baremetal-operator` now determines a media access control (MAC) address for a control plane node by using a control plane machine with a defined `master` role. Before the {product-title} {product-version} release, the `cluster-baremetal-operator` searched for a `BareMetalHost` object for a defined `master` role. If the object did not define a `master` role, the `Matel3` pod would not start. (link: https://issues.redhat.com/browse/OCPBUGS-1806 [* OCPBUGS-1806 *])
    • Done

    Description

      Description of problem:

      Disconnected IPI OCP 4.11.5 cluster install on baremetal fails when hostname of master nodes does not include "master"    

      Version-Release number of selected component (if applicable): 4.11.5

      How reproducible:  Perform disconnected IPI install of OCP 4.11.5 on bare metal with master nodes that do not contain the text "master"

      Steps to Reproduce:

      Perform disconnected IPI install of OCP 4.11.5 on bare metal with master nodes that do not contain the text "master"

      Actual results: master nodes do come up.

      Expected results: master nodes should come up despite that the text "master" is not in their hostname.

      Additional info:

      Disconnected IPI OCP 4.11.5 cluster install on baremetal fails when hostname of master nodes does not include "master"    

      My cust reinstall new cluster using the fix here . But they have the exact same issue. The metal3 pod have  PROVISIONING_MACS value  empty.  Can we work together with them to understand why the new code fix https://github.com/openshift/cluster-baremetal-operator/commit/76bd6bc461b30a6a450f85a42e492a0933178aee is not working.

      cat metal3-static-ip-set/metal3-static-ip-set/logs/current.log
      2022-09-27T14:19:38.140662564Z + '[' -z 10.17.199.3/27 ']'
      2022-09-27T14:19:38.140662564Z + '[' -z '' ']'
      2022-09-27T14:19:38.140662564Z + '[' -n '' ']'
      2022-09-27T14:19:38.140722345Z ERROR: Could not find suitable interface for "10.17.199.3/27"
      2022-09-27T14:19:38.140726312Z + '[' -n '' ']'
      2022-09-27T14:19:38.140726312Z + echo 'ERROR: Could not find suitable interface for "10.17.199.3/27"'
      2022-09-27T14:19:38.140726312Z + exit 1

       

      cat metal3-b9bf8d595-gv94k.yaml
      ...
      initContainers:
      
      command: /set-static-ip
      env: name: PROVISIONING_IP
      value: 10.17.199.3/27 name: PROVISIONING_INTERFACE name: PROVISIONING_MACS <------------------------- missing MACS
      image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4f04793bd109ecba2dfe43be93dc990ac5299272482c150bd5f2eee0f80c983b
      imagePullPolicy: IfNotPresent
      name: metal3-static-ip-set
      .... 
      • omc logs machine-api-controllers-6b9ffd96cd-grh6l -c nodelink-controller  -n openshift-machine-api
        2022-09-21T16:13:43.600517485Z I0921 16:13:43.600513       1 nodelink_controller.go:408] Finding machine from node "blocp-1-106-m-0.c106-1.sc.evolhse.hydro.qc.ca"
        2022-09-21T16:13:43.600521381Z I0921 16:13:43.600517       1 nodelink_controller.go:425] Finding machine from node "blocp-1-106-m-0.c106-1.sc.evolhse.hydro.qc.ca" by ProviderID
        2022-09-21T16:13:43.600525225Z W0921 16:13:43.600521       1 nodelink_controller.go:427] Node "blocp-1-106-m-0.c106-1.sc.evolhse.hydro.qc.ca" has no providerID
        2022-09-21T16:13:43.600528917Z I0921 16:13:43.600524       1 nodelink_controller.go:448] Finding machine from node "blocp-1-106-m-0.c106-1.sc.evolhse.hydro.qc.ca" by IP
        2022-09-21T16:13:43.600532711Z I0921 16:13:43.600529       1 nodelink_controller.go:453] Found internal IP for node "blocp-1-106-m-0.c106-1.sc.evolhse.hydro.qc.ca": "10.17.192.33"
        2022-09-21T16:13:43.600551289Z I0921 16:13:43.600544       1 nodelink_controller.go:477] Matching machine not found for node "blocp-1-106-m-0.c106-1.sc.evolhse.hydro.qc.ca" with internal IP "10.17.192.33"

      From @dtantsur WIP PR: https://github.com/openshift/cluster-baremetal-operator/pull/299

      Customer is waiting for this fix. The previous code change don't fix customer situation.

      Please refer to this slack thread :https://coreos.slack.com/archives/CFP6ST0A3/p1664215102459219

      Attachments

        Issue Links

          Activity

            People

              rhn-engineering-dtantsur Dmitry Tantsur
              rhn-support-elalance Erik Lalancette
              Jad Haj Yahya Jad Haj Yahya
              Darragh Fitzmaurice Darragh Fitzmaurice
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: