Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3479

[4.12] Baremetal Provisioning fails on HP Gen9 systems due to eTag handling

XMLWordPrintable

    • Important
    • None
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      Before this update, the Ironic provisioning service did not support Baseboard Management Controllers (BMCs) that use weak eTags combined with strict eTag validation. By design, if the BMC provides a weak eTag, Ironic returns two eTags: the original eTag and the original eTag converted to the strong format for compatibility with BMCs that do not support weak eTags. Although Ironic can send two eTags, BMCs using strict eTag validation reject such requests due to the presence of the second eTag. As a result, on some older server hardware, bare-metal provisioning failed with the following error: `HTTP 412 Precondition Failed`. In {product-title} 4.12 and later, this behavior changes and Ironic no longer attempts to send two eTags in cases where a weak eTag is provided. Instead, if a Redfish request dependent on an eTag fails with an eTag validation error, Ironic retries the request with known workarounds. This minimizes the risk of bare-metal provisioning failures on machines with strict eTag validation. (link:https://issues.redhat.com/browse/OCPBUGS-3479[*OCPBUGS#3479*])
      Show
      Before this update, the Ironic provisioning service did not support Baseboard Management Controllers (BMCs) that use weak eTags combined with strict eTag validation. By design, if the BMC provides a weak eTag, Ironic returns two eTags: the original eTag and the original eTag converted to the strong format for compatibility with BMCs that do not support weak eTags. Although Ironic can send two eTags, BMCs using strict eTag validation reject such requests due to the presence of the second eTag. As a result, on some older server hardware, bare-metal provisioning failed with the following error: `HTTP 412 Precondition Failed`. In {product-title} 4.12 and later, this behavior changes and Ironic no longer attempts to send two eTags in cases where a weak eTag is provided. Instead, if a Redfish request dependent on an eTag fails with an eTag validation error, Ironic retries the request with known workarounds. This minimizes the risk of bare-metal provisioning failures on machines with strict eTag validation. (link: https://issues.redhat.com/browse/OCPBUGS-3479 [*OCPBUGS#3479*])
    • Bug Fix
    • Done
    • Red Hat OpenShift Container Platform

      Description of problem:

      In OpenShift 4.10 (and backported to 4.9, I think 4.9.13?) we introduced some new changes to sushy in an attempt to maximise compatibility with newer hardware platforms via Redfish with better eTag management, however we accidentally broke compatibility with older hardware platforms, namely HP Gen9 via iLO4 systems, and I believe some other HPE hardware has been affected also. This has caused a regression and has resulted in customers no longer being able to manage or provision new clusters with these systems.
      
      We've validated that a test patch (https://review.opendev.org/c/openstack/sushy/+/856123) is able to workaround this problem with a failback to no eTag validation and should allow customers to continue to utilise their systems.

      Version-Release number of selected component (if applicable):

      4.9.13+ we believe.

      How reproducible:

      Consistently

      Steps to Reproduce:

      1. Attempt an openshift-install with 4.9.13+ via redfish
      2. Validate that nodes cannot be provisioned due to 412 errors (eTag)
      3. Alternatively bring a HP Gen9 system into management on a pre-installed cluster, or upgrade to 4.9.13+ on a previously working environment with such hardware.
      

      Actual results:

      2022-07-10 09:25:03.872 1 WARNING sushy.exceptions [req-363d0a13-99d4-44c2-8e7c-928b325f9b75 ironic-user - - - -] Error response from PATCH https://<ip-address>/redfish/v1/Systems/1/ with status code 412 has no JSON body: simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
      2022-07-10 09:25:03.873 1 DEBUG sushy.exceptions [req-363d0a13-99d4-44c2-8e7c-928b325f9b75 ironic-user - - - -] HTTP response for PATCH https://<ip-address>/redfish/v1/Systems/1/: status code: 412, error: unknown error, extended: none __init__ /usr/lib/python3.6/site-packages/sushy/exceptions.py:122
      2022-07-10 09:25:03.873 1 ERROR ironic.drivers.modules.redfish.management [req-363d0a13-99d4-44c2-8e7c-928b325f9b75 ironic-user - - - -] Setting boot mode to uefi failed for node 5abf3a2c-c662-48c3-a509-613e1d47606b. Error: HTTP PATCH https://<ip-address>/redfish/v1/Systems/1/ returned code 412. unknown error Extended information: none: sushy.exceptions.HTTPError: HTTP PATCH https://<ip-address>/redfish/v1/Systems/1/ returned code 412. unknown error Extended information: none
      2022-07-10 09:25:03.874 1 INFO ironic.drivers.modules.redfish.management [req-363d0a13-99d4-44c2-8e7c-928b325f9b75 ironic-user - - - -] Attempt to set boot mode on node 5abf3a2c-c662-48c3-a509-613e1d47606b failed to set boot mode as the node does not appear to support overriding the boot mode. Possibly partial Redfish implementation?

      Expected results:

      Node can be managed just fine :-)

      Additional info:

      Potential fix: https://review.opendev.org/c/openstack/sushy/+/856123
      
      Similar/linked issues: https://bugzilla.redhat.com/show_bug.cgi?id=2084059, https://bugzilla.redhat.com/show_bug.cgi?id=2103710, https://issues.redhat.com/browse/OCPBUGS-602, and https://issues.redhat.com/browse/METAL-343

              janders@redhat.com Jacob Anders
              rhn-sso-roxenham Rhys Oxenham (Inactive)
              Jad Haj Yahya Jad Haj Yahya
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: