[OCPBUGS-6769] TALM 4.11 pre-cache fails on 4.10 cluster - Red Hat Issue Tracker

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: 4.13.0
Affects Version/s: 4.11.z
Component/s: TALM Operator
Labels:
None

Severity:
Critical
Regression:
None
Sprint:
CNF RAN Sprint 231
sprint_count:
1
Release Blocker:
Proposed
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Internal Whiteboard:
Target Version:

4.13.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Running TALM 4.11.3 on a 4.11 hub cluster. I have 2 clusters deployed using gitops ztp running 4.10.45. I created upgrade policies for both OCP and operators to move to 4.11. The target OCP image in the policies is 4.11.21.
I created a CGU for the upgrade with enable false. 

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: update-cgu
  namespace: default
spec:
  clusters:
  - cnfdf19
  - cnfdf29
  enable: false
  managedPolicies:
  - du-upgrade-platform-upgrade-prep
  - du-upgrade-platform-upgrade
  - common-config-policy
  - common-subscriptions-policy
  preCaching: false
  remediationStrategy:
    maxConcurrency: 1
    timeout: 360When I enabled preCaching the pod on the spoke clusters failed:
    containerStatuses:
    - containerID: cri-o://7e1d7a1912440bb105d8365f8ec548bfddca072c7d372e80c8deacaac0e8d3e9
      image: registry.redhat.io/openshift4/topology-aware-lifecycle-manager-precache-rhel8@sha256:40249617608848518f9cd2db99f73a6f72642b28b273b1d8f34616ff1f16983b
      imageID: registry.redhat.io/openshift4/topology-aware-lifecycle-manager-precache-rhel8@sha256:40249617608848518f9cd2db99f73a6f72642b28b273b1d8f34616ff1f16983b
      lastState: {}
      name: pre-cache-container
      ready: false
      restartCount: 0
      started: false
      state:
        terminated:
          containerID: cri-o://7e1d7a1912440bb105d8365f8ec548bfddca072c7d372e80c8deacaac0e8d3e9
          exitCode: 139
          finishedAt: "2023-01-24T22:01:31Z"
          reason: Error
          startedAt: "2023-01-24T22:01:31Z"

Version-Release number of selected component (if applicable):

4.11.3
Precache container version: registry.redhat.io/openshift4/topology-aware-lifecycle-manager-precache-rhel8@sha256:40249617608848518f9cd2db99f73a6f72642b28b273b1d8f34616ff1f16983b

How reproducible:

100%

Steps to Reproduce:

1. (in description)
2.
3.

Actual results:

pre-cache pod moves to error.
TALM status moves to UnrecoverableError for the clusters (cnfdf19 also went to UnrecoverableError shortly after this):
    status:
      cnfdf19: Active
      cnfdf29: UnrecoverableError

Expected results:

Precaching succeeds

Additional info:

I used the a configmap on the hub cluster in the same namespace as the CGU to test with a newer 4.11 and 4.12 precache images. These both failed. When I used the configmap to run the latest 4.10 precache container image the precaching pods ran as expected.

ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-group-upgrade-overrides
  namespace: default
data:
  ## 4.10 image pushed to my quay
  precache.image: quay.io/imiller/testrepo1:0.10

The valid image:
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2294306
registry-proxy.engineering.redhat.com/rh-osbs/openshift-topology-aware-lifecycle-operator-precache-rhel8@sha256:a8cb52e5c15c8a530e175ab09c75853c921509ecf3eed707dc7d307ebdaf73cd

clones

OCPBUGS-6768 TALM 4.11 pre-cache fails on 4.10 cluster

Closed

Assignee:: Nishant Parekh (Inactive)

Reporter:: Ian Miller

QA Contact:: Yang Liu

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2023/01/30 2:11 AM

Updated:: 2023/05/18 2:33 AM

Resolved:: 2023/05/18 2:33 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide