Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3111

metal3 pod crashloops on OKD in BareMetal IPI or assisted-installer bare metal installations

XMLWordPrintable

    • Important
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • NA

      This is a clone of issue OCPBUGS-2992. The following is the description of the original issue:

      Description of problem:

      The metal3-ironic container image in OKD fails during steps in configure-ironic.sh that look for additional Oslo configuration entries as environment variables to configure the Ironic instance. The mechanism by which it fails in OKD but not OpenShift is that the image for OpenShift happens to have unrelated variables set which match the regex, because it is based on the builder image, but the OKD image is based only on a stream8 image without these unrelated OS_ prefixed variables set.
      
      The metal3 pod created in response to even a provisioningNetwork: Disabled Provisioning object will therefore crashloop indefinitely.

      Version-Release number of selected component (if applicable):

      4.11

      How reproducible:

      Always

      Steps to Reproduce:

      1. Deploy OKD to a bare metal cluster using the assisted-service, with the OKD ConfigMap applied to podman play kube, as in :https://github.com/openshift/assisted-service/tree/master/deploy/podman#okd-configuration
      2. Observe the state of the metal3 pod in the openshift-machine-api namespace.
      

      Actual results:

      The metal3-ironic container repeatedly exits with nonzero, with the logs ending here:
      
      ++ export IRONIC_URL_HOST=10.1.1.21
      ++ IRONIC_URL_HOST=10.1.1.21
      ++ export IRONIC_BASE_URL=https://10.1.1.21:6385
      ++ IRONIC_BASE_URL=https://10.1.1.21:6385
      ++ export IRONIC_INSPECTOR_BASE_URL=https://10.1.1.21:5050
      ++ IRONIC_INSPECTOR_BASE_URL=https://10.1.1.21:5050
      ++ '[' '!' -z '' ']'
      ++ '[' -f /etc/ironic/ironic.conf ']'
      ++ cp /etc/ironic/ironic.conf /etc/ironic/ironic.conf_orig
      ++ tee /etc/ironic/ironic.extra
      # Options set from Environment variables
      ++ echo '# Options set from Environment variables'
      ++ env
      ++ grep '^OS_'
      ++ tee -a /etc/ironic/ironic.extra

      Expected results:

      The metal3-ironic container starts and the metal3 pod is reported as ready.

      Additional info:

      This is the PR that introduced pipefail to the downstream ironic-image, which is not yet accepted in the upstream:
      https://github.com/openshift/ironic-image/pull/267/files#diff-ab2b20df06f98d48f232d90f0b7aa464704257224862780635ec45b0ce8a26d4R3
      
      This is the line that's failing:
      https://github.com/openshift/ironic-image/blob/4838a077d849070563b70761957178055d5d4517/scripts/configure-ironic.sh#L57
      
      This is the image base that OpenShift uses for ironic-image (before rewriting in ci-operator):
      https://github.com/openshift/ironic-image/blob/4838a077d849070563b70761957178055d5d4517/Dockerfile.ocp#L9
      
      Here is where the relevant environment variables are set in the builder images for OCP:
      https://github.com/openshift/builder/blob/973602e0e576d7eccef4fc5810ba511405cd3064/hack/lib/build/version.sh#L87
      
      Here is the final FROM line in the OKD image build (just stream8):
      https://github.com/openshift/ironic-image/blob/4838a077d849070563b70761957178055d5d4517/Dockerfile.okd#L9
      
      This results in the following differences between the two images:
      $ podman run --rm -it --entrypoint bash quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:519ac06836d972047f311de5e57914cf842716e22a1d916a771f02499e0f235c -c 'env | grep ^OS_'
      OS_GIT_MINOR=11
      OS_GIT_TREE_STATE=clean
      OS_GIT_COMMIT=97530a7
      OS_GIT_VERSION=4.11.0-202210061001.p0.g97530a7.assembly.stream-97530a7
      OS_GIT_MAJOR=4
      OS_GIT_PATCH=0
      $ podman run --rm -it --entrypoint bash quay.io/openshift/okd-content@sha256:6b8401f8d84c4838cf0e7c598b126fdd920b6391c07c9409b1f2f17be6d6d5cb -c 'env | grep ^OS_'
      
      Here is what the OS_ prefixed variables should be used for:
      https://github.com/metal3-io/ironic-image/blob/807a120b4ce5e1675a79ebf3ee0bb817cfb1f010/README.md?plain=1#L36
      https://opendev.org/openstack/oslo.config/src/commit/84478d83f87e9993625044de5cd8b4a18dfcaf5d/oslo_config/sources/_environment.py
      
      It's worth noting that ironic.extra is not consumed anywhere, and is simply being used here to save off the variables that Oslo _might_ be consuming (it won't consume the variables that are present in the OCP builder image, though they do get caught by this regex).
      
      With pipefail set, grep returns non-zero when it fails to find an environment variable that matches the regex, as in the case of the OKD ironic-image builds.

       

              rhn-engineering-dtantsur Dmitry Tantsur
              openshift-crt-jira-prow OpenShift Prow Bot
              Pedro Jose Amoedo Martinez Pedro Jose Amoedo Martinez
              James Harmison
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: