Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-50860

Configuration API mismatch between HO and CPO

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • 4.17.z, 4.18.z
    • HyperShift
    • Critical
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      There's a missmatch between the .configuration API vendored for the HC controller and the one in those older versions of the CPO controller.
      The HO computes the hash including "" for that field. The cpo doesn't see the field at all to compute the mcs hash. That'd cause the missmatch.
      
      Slack thread https://redhat-internal.slack.com/archives/C04EUL1DRHC/p1739540592972569?thread_ts=1739527507.065809&cid=C04EUL1DRHC 
      
          

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1. Create an HostedCluster with ImageConfig set in an 4.17.15
          2. Check NodePool rollout
          3. Ignition is failing
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

            [OCPBUGS-50860] Configuration API mismatch between HO and CPO

            A backport is not needed for this fix. At some point after 4.18 release, the HO and CPO started creating a new hash, let's call it hash v2, which is backward-incompatible. So for a setup where there's a new HO and old CPO, the creation of HC will fail.
            So the fix was to change both the HO and CPO to create a backward-compatible v1 hash.

            Now since this new behavior happened for a short period after the 4.18 release cut, it was a bug only in 4.19 and so fixing it on main was sufficient.

            Ahmed Abdalla Abdelrehim added a comment - A backport is not needed for this fix. At some point after 4.18 release, the HO and CPO started creating a new hash, let's call it hash v2, which is backward-incompatible. So for a setup where there's a new HO and old CPO, the creation of HC will fail. So the fix was to change both the HO and CPO to create a backward-compatible v1 hash. Now since this new behavior happened for a short period after the 4.18 release cut, it was a bug only in 4.19 and so fixing it on main was sufficient.

            Verified with latest HO ,CPO and OCP 4.19(4.19.0-0.nightly-2025-02-26-050226).

            Config spec.configuration.image , there are no operators degraded in hosted clusters, and the setting in image.config of hosted cluster are correct. Then move the bug to verified.

            Set spec.configuration.image.imageStreamImportMode will be ignored (imageStreamImportMode is a featuregate function in standalone OCP)

            XiuJuan Wang added a comment - Verified with latest HO ,CPO and OCP 4.19(4.19.0-0.nightly-2025-02-26-050226). Config spec.configuration.image , there are no operators degraded in hosted clusters, and the setting in image.config of hosted cluster are correct. Then move the bug to verified. Set spec.configuration.image.imageStreamImportMode will be ignored (imageStreamImportMode is a featuregate function in standalone OCP)

            He Liu added a comment -

            The lasted HO with this bug fix has been deployed into Int/Stage environment. And Rosa HCP QE rhn-support-yingzhan has verified the related OCM bug on 4.17/4.18. 

            But we need more time to test more since it is related to CPO (Payload) too. 

            He Liu added a comment - The lasted HO with this bug fix has been deployed into Int/Stage environment. And Rosa HCP QE rhn-support-yingzhan has verified the related OCM bug on 4.17/4.18.  But we need more time to test more since it is related to CPO (Payload) too. 

            +1, not an OpenShift release blocker

            Andrea Decorte added a comment - +1, not an OpenShift release blocker

            Hi rh-ee-aabdelre,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi rh-ee-aabdelre , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            Yufen Chang added a comment -

            Thanks for the analysis. So that means this will only happen in STG due to a new HO version there. Could you help to roll out a fix to STG as well ? Most of QE testing happen on STG. Thanks.

            Yufen Chang added a comment - Thanks for the analysis. So that means this will only happen in STG due to a new HO version there. Could you help to roll out a fix to STG as well ? Most of QE testing happen on STG. Thanks.

            Ahmed Abdalla Abdelrehim added a comment - - edited

            I don't believe this is a release blocker for 4.18. This bug manifests only when

            1. HyperShift Operator
              1. Has this version or newer in its dependencies github.com/openshift/api v0.0.0-20240814144229-3f284e60eb8c
            2. Control Plane Operator
              1. Has any version older than this dependency github.com/openshift/api v0.0.0-20240814144229-3f284e60eb8c
            3. The HostedCluster has spec.configuration.image set.

            On release-4.18 branch HyperShift Operator has

            • github.com/openshift/api v0.0.0-20240524162738-d899f8877d22 which is older than github.com/openshift/api v0.0.0-20240814144229-3f284e60eb8c

            This is why this bug will not manifest in 4.18 HyperShift Operators. And it only manifests when the HO is from main and CPO with is from 4.18 or older.

            Ahmed Abdalla Abdelrehim added a comment - - edited I don't believe this is a release blocker for 4.18. This bug manifests only when HyperShift Operator Has this version or newer in its dependencies github.com/openshift/api v0.0.0-20240814144229-3f284e60eb8c Control Plane Operator Has any version older than this dependency github.com/openshift/api v0.0.0-20240814144229-3f284e60eb8c The HostedCluster has spec.configuration.image set. On release-4.18 branch HyperShift Operator has github.com/openshift/api v0.0.0-20240524162738-d899f8877d22 which is older than github.com/openshift/api v0.0.0-20240814144229-3f284e60eb8c This is why this bug will not manifest in 4.18 HyperShift Operators. And it only manifests when the HO is from main and CPO with is from 4.18 or older.

            rh-ee-aabdelre asegurap1@redhat.com Please triage this Release Blocker Rejected/Accepted for 4.18.0.   Thanks.

            Mike Fiedler added a comment - rh-ee-aabdelre asegurap1@redhat.com Please triage this Release Blocker Rejected/Accepted for 4.18.0.   Thanks.

            Yufen Chang added a comment -

            This will bring failures in ROSA when using registry config. seehttps://issues.redhat.com/browse/OCM-13896, and block ROSA release. Suggest to fix this issue. Thanks.

            Yufen Chang added a comment - This will bring failures in ROSA when using registry config. seehttps://issues.redhat.com/browse/OCM-13896, and block ROSA release. Suggest to fix this issue. Thanks.

              rh-ee-aabdelre Ahmed Abdalla Abdelrehim
              rh-ee-adecorte Andrea Decorte
              XiuJuan Wang XiuJuan Wang
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated: