Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-6768

TALM 4.11 pre-cache fails on 4.10 cluster

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • 4.12.z
    • 4.11.z
    • TALM Operator
    • None
    • Critical
    • None
    • CNF RAN Sprint 231
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Running TALM 4.11.3 on a 4.11 hub cluster. I have 2 clusters deployed using gitops ztp running 4.10.45. I created upgrade policies for both OCP and operators to move to 4.11. The target OCP image in the policies is 4.11.21.
      I created a CGU for the upgrade with enable false. 
      
      apiVersion: ran.openshift.io/v1alpha1
      kind: ClusterGroupUpgrade
      metadata:
        name: update-cgu
        namespace: default
      spec:
        clusters:
        - cnfdf19
        - cnfdf29
        enable: false
        managedPolicies:
        - du-upgrade-platform-upgrade-prep
        - du-upgrade-platform-upgrade
        - common-config-policy
        - common-subscriptions-policy
        preCaching: false
        remediationStrategy:
          maxConcurrency: 1
          timeout: 360When I enabled preCaching the pod on the spoke clusters failed:
          containerStatuses:
          - containerID: cri-o://7e1d7a1912440bb105d8365f8ec548bfddca072c7d372e80c8deacaac0e8d3e9
            image: registry.redhat.io/openshift4/topology-aware-lifecycle-manager-precache-rhel8@sha256:40249617608848518f9cd2db99f73a6f72642b28b273b1d8f34616ff1f16983b
            imageID: registry.redhat.io/openshift4/topology-aware-lifecycle-manager-precache-rhel8@sha256:40249617608848518f9cd2db99f73a6f72642b28b273b1d8f34616ff1f16983b
            lastState: {}
            name: pre-cache-container
            ready: false
            restartCount: 0
            started: false
            state:
              terminated:
                containerID: cri-o://7e1d7a1912440bb105d8365f8ec548bfddca072c7d372e80c8deacaac0e8d3e9
                exitCode: 139
                finishedAt: "2023-01-24T22:01:31Z"
                reason: Error
                startedAt: "2023-01-24T22:01:31Z"
      

      Version-Release number of selected component (if applicable):

      4.11.3
      Precache container version: registry.redhat.io/openshift4/topology-aware-lifecycle-manager-precache-rhel8@sha256:40249617608848518f9cd2db99f73a6f72642b28b273b1d8f34616ff1f16983b

      How reproducible:

      100%

      Steps to Reproduce:

      1. (in description)
      2.
      3.
      

      Actual results:

      pre-cache pod moves to error.
      TALM status moves to UnrecoverableError for the clusters (cnfdf19 also went to UnrecoverableError shortly after this):
          status:
            cnfdf19: Active
            cnfdf29: UnrecoverableError

       

      Expected results:

      Precaching succeeds

      Additional info:

      I used the a configmap on the hub cluster in the same namespace as the CGU to test with a newer 4.11 and 4.12 precache images. These both failed. When I used the configmap to run the latest 4.10 precache container image the precaching pods ran as expected.
      
      ConfigMap:
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: cluster-group-upgrade-overrides
        namespace: default
      data:
        ## 4.10 image pushed to my quay
        precache.image: quay.io/imiller/testrepo1:0.10
      
      The valid image:
      https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2294306
      registry-proxy.engineering.redhat.com/rh-osbs/openshift-topology-aware-lifecycle-operator-precache-rhel8@sha256:a8cb52e5c15c8a530e175ab09c75853c921509ecf3eed707dc7d307ebdaf73cd

       

              nparekh@redhat.com Nishant Parekh (Inactive)
              rhn-support-imiller Ian Miller
              Yang Liu Yang Liu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: