-
Bug
-
Resolution: Done
-
Critical
-
ACM 2.9.0, ACM 2.9.1, MCE 2.4.2
-
False
-
None
-
False
-
-
-
+
-
No
Description of problem:
I'm trying to upgrade managed clusters from ACM GUI in a completely air-gapped environment. We already have everything ready to upgrade them using OSUS and we have upgrades available from Hub cluster. If we try to upgrade them directly without launch it from ACM, it'll work.However, when we try to change the channel or launch the upgrade from ACM GUI, nothing happens. If we look at clustercurator resource, we'll find this error msg: Job_failed -> Desiredcuration: Upgrade version (4.12.40; Failed -hostedclusters.hypershift.openshift.io "clustername" not found.
Version-Release number of selected component (if applicable):
ACM 2.9.0
MCE: 2.4.2
How reproducible:
Steps to Reproduce:
- Import a cluster using auto-import-secret
- Try to change channel or upgrade managed cluster from ACM
Actual results:
$ oc get clustercurator -n <clustername> No resources found in <clustername> namespace. $ oc get clustercurator -A No resources found $ oc get pods -n <clustername> NAME READY STATUS RESTARTS AGE curator-job-zzp4l-dvgxr 0/1 Init:Error 0 97s $ oc logs curator-job-zzp4l-dvgxr -n <clustername> Defaulted container "done" out of: done, upgrade-cluster (init), monitor-upgrade (init) Error from server (BadRequest): container "done" in pod "curator-job-zzp4l-dvgxr" is waiting to start: PodInitializing ]$ oc get clustercurator <clustername> -n <clustername> -o yaml apiVersion: cluster.open-cluster-management.io/v1beta1 kind: ClusterCurator metadata: creationTimestamp: "2023-12-19T07:26:16Z" generation: 6 name: <clustername> namespace: <clustername> resourceVersion: "986711" uid: d4d3a3ad-6c01-4418-8c8c-3b3017e4353c spec: desiredCuration: upgrade destroy: jobMonitorTimeout: 5 install: jobMonitorTimeout: 5 scale: jobMonitorTimeout: 5 upgrade: channel: eus-4.12 monitorTimeout: 120 status: conditions: - lastTransitionTime: "2023-12-19T07:26:49Z" message: 'curator-job-zzp4l DesiredCuration: upgrade Version (;eus-4.12;) Failed - hostedclusters.hypershift.openshift.io "<clustername>" not found' reason: Job_failed status: "True" type: clustercurator-job - lastTransitionTime: "2023-12-19T07:26:49Z" message: Executing init container upgrade-cluster reason: Job_has_finished status: "False" type: upgrade-cluster $ oc get pod curator-job-zzp4l-dvgxr -o yaml apiVersion: v1 kind: Pod metadata: annotations: k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.130.0.19/23"],"mac_address":"0a:58:0a:82:00:13","gateway_ips":["10.130.0.1"],"ip_address":"10.130.0.19/23","gateway_ip":"10.130.0.1"}}' k8s.v1.cni.cncf.io/network-status: |- [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.130.0.19" ], "mac": "0a:58:0a:82:00:13", "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: |- [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.130.0.19" ], "mac": "0a:58:0a:82:00:13", "default": true, "dns": {} }] openshift.io/scc: restricted-v2 seccomp.security.alpha.kubernetes.io/pod: runtime/default creationTimestamp: "2023-12-19T07:26:16Z" generateName: curator-job-zzp4l- labels: controller-uid: a4a9a71c-4884-4235-9ca4-430c184afa34 job-name: curator-job-zzp4l name: curator-job-zzp4l-dvgxr namespace: <clustername> ownerReferences: - apiVersion: batch/v1 blockOwnerDeletion: true controller: true kind: Job name: curator-job-zzp4l uid: a4a9a71c-4884-4235-9ca4-430c184afa34 resourceVersion: "986784" uid: 019429e9-884e-4953-aa60-46cc28c4b7a7 spec: containers: - command: - ./curator - done - <clustername> image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3 imagePullPolicy: IfNotPresent name: done resources: {} securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000710000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-nsjkl readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true imagePullSecrets: - name: cluster-installer-dockercfg-fhp27 initContainers: - command: - ./curator - upgrade-cluster - <clustername> image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3 imagePullPolicy: Always name: upgrade-cluster resources: limits: cpu: 2m memory: 45Mi requests: cpu: 1m memory: 30Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000710000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-nsjkl readOnly: true - command: - ./curator - monitor-upgrade - <clustername> image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3 imagePullPolicy: Always name: monitor-upgrade resources: limits: cpu: 2m memory: 45Mi requests: cpu: 1m memory: 30Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000710000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-nsjkl readOnly: true nodeName: <clustername>-wkr03 preemptionPolicy: PreemptLowerPriority priority: 0 restartPolicy: Never schedulerName: default-scheduler securityContext: fsGroup: 1000710000 seLinuxOptions: level: s0:c27,c4 seccompProfile: type: RuntimeDefault serviceAccount: cluster-installer serviceAccountName: cluster-installer terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 - effect: NoSchedule key: node.kubernetes.io/memory-pressure operator: Exists volumes: - name: kube-api-access-nsjkl projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace - configMap: items: - key: service-ca.crt path: service-ca.crt name: openshift-service-ca.crt status: conditions: - lastProbeTime: null lastTransitionTime: "2023-12-19T07:26:16Z" message: 'containers with incomplete status: [upgrade-cluster monitor-upgrade]' reason: ContainersNotInitialized status: "False" type: Initialized - lastProbeTime: null lastTransitionTime: "2023-12-19T07:26:16Z" reason: PodFailed status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2023-12-19T07:26:16Z" reason: PodFailed status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2023-12-19T07:26:16Z" status: "True" type: PodScheduled containerStatuses: - image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3 imageID: "" lastState: {} name: done ready: false restartCount: 0 started: false state: waiting: reason: PodInitializing hostIP: 10.110.51.109 initContainerStatuses: - containerID: cri-o://69c645182cd74d5e8b133ee364c779b83c864c6129bab902c8411eda7cc0e8ac image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3 imageID: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:2d89baaed1c8ce6121d26e9d199161158db5fbe8ae27d2c782950a1b57902fe3 lastState: {} name: upgrade-cluster ready: false restartCount: 0 state: terminated: containerID: cri-o://69c645182cd74d5e8b133ee364c779b83c864c6129bab902c8411eda7cc0e8ac exitCode: 2 finishedAt: "2023-12-19T07:26:50Z" reason: Error startedAt: "2023-12-19T07:26:45Z" - image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3 imageID: "" lastState: {} name: monitor-upgrade ready: false restartCount: 0 state: waiting: reason: PodInitializing phase: Failed podIP: 10.130.0.19 podIPs: - ip: 10.130.0.19 qosClass: Burstable startTime: "2023-12-19T07:26:16Z" $ oc logs curator-job-q289c-57dzd -c upgrade-cluster I1219 15:11:59.254388 1 curator.go:86] Mode: upgrade-cluster Cluster I1219 15:11:59.757491 1 curator.go:111] Found clusterCurator resource "<clustername>" ✓ E1219 15:12:00.655803 1 helpers.go:99] hostedclusters.hypershift.openshift.io "<clustername>" not found panic: hostedclusters.hypershift.openshift.io "<clustername>" not found [recovered] panic: hostedclusters.hypershift.openshift.io "<clustername>" not found goroutine 1 [running]: main.curatorRun.func1() /remote-source/app/cmd/curator/curator.go:150 +0x259 panic({0x18646c0, 0xc0000ced20}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/stolostron/cluster-curator-controller/pkg/jobs/utils.CheckError({0x1ca5820, 0xc0000ced20}) /remote-source/app/pkg/jobs/utils/helpers.go:100 +0xd4 main.curatorRun(0x0?, {0x1cc2fe0, 0xc000192900}, {0x7fff3bd65e1d, 0xc}, {0xc000640070, 0xc}) /remote-source/app/cmd/curator/curator.go:396 +0x1b65 main.main() /remote-source/app/cmd/curator/curator.go:66 +0x1c5 $ oc logs curator-job-q289c-57dzd -c monitor-upgrade Error from server (BadRequest): container "monitor-upgrade" in pod "curator-job-q289c-57dzd" is waiting to start: PodInitializing
Expected results:
Managed Cluster upgraded
Additional info:
Secret to import cluster
apiVersion: v1 kind: Secret metadata: name: auto-import-secret namespace: {{ ocp_cluster_name }} stringData: autoImportRetry: "5" token: {{ user_token.stdout }} server: https://api.{{ ocp_cluster_name }}.{{ basedomain }}:6443 type: Opaque
- is cloned by
-
ACM-9439 Unable to upgrade managed cluster from ACM 2.10
- Closed
- links to
-
RHSA-2024:126795 Red Hat Advanced Cluster Management 2.9.3 security and bug fix container updates