Loading...

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: ACM 2.9.3
Affects Version/s: ACM 2.9.0, ACM 2.9.1, MCE 2.4.2
Component/s: Cluster Lifecycle
Labels:
- errata

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

Test Coverage:

+
Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Priority Data:
PX Impact Score:

Description of problem:

I'm trying to upgrade managed clusters from ACM GUI in a completely air-gapped environment. We already have everything ready to upgrade them using OSUS and we have upgrades available from Hub cluster. If we try to upgrade them directly without launch it from ACM, it'll work.However, when we try to change the channel or launch the upgrade from ACM GUI, nothing happens. If we look at clustercurator resource, we'll find this error msg: Job_failed -> Desiredcuration: Upgrade version (4.12.40; Failed -hostedclusters.hypershift.openshift.io "clustername" not found.

Version-Release number of selected component (if applicable):

ACM 2.9.0

MCE: 2.4.2

How reproducible:

Steps to Reproduce:

Import a cluster using auto-import-secret
Try to change channel or upgrade managed cluster from ACM

Actual results:

$ oc get clustercurator -n <clustername>
No resources found in <clustername> namespace.

$ oc get clustercurator -A
No resources found

$ oc get pods -n <clustername>
NAME                      READY   STATUS       RESTARTS   AGE
curator-job-zzp4l-dvgxr   0/1     Init:Error   0          97s

$ oc logs curator-job-zzp4l-dvgxr  -n <clustername>
Defaulted container "done" out of: done, upgrade-cluster (init), monitor-upgrade (init)
Error from server (BadRequest): container "done" in pod "curator-job-zzp4l-dvgxr" is waiting to start: PodInitializing

]$ oc get clustercurator <clustername> -n <clustername> -o yaml
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: ClusterCurator
metadata:
  creationTimestamp: "2023-12-19T07:26:16Z"
  generation: 6
  name: <clustername>
  namespace: <clustername>
  resourceVersion: "986711"
  uid: d4d3a3ad-6c01-4418-8c8c-3b3017e4353c
spec:
  desiredCuration: upgrade
  destroy:
    jobMonitorTimeout: 5
  install:
    jobMonitorTimeout: 5
  scale:
    jobMonitorTimeout: 5
  upgrade:
    channel: eus-4.12
    monitorTimeout: 120
status:
  conditions:
  - lastTransitionTime: "2023-12-19T07:26:49Z"
    message: 'curator-job-zzp4l DesiredCuration: upgrade Version (;eus-4.12;) Failed
      - hostedclusters.hypershift.openshift.io "<clustername>" not found'
    reason: Job_failed
    status: "True"
    type: clustercurator-job
  - lastTransitionTime: "2023-12-19T07:26:49Z"
    message: Executing init container upgrade-cluster
    reason: Job_has_finished
    status: "False"
    type: upgrade-cluster




$ oc get pod curator-job-zzp4l-dvgxr -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.130.0.19/23"],"mac_address":"0a:58:0a:82:00:13","gateway_ips":["10.130.0.1"],"ip_address":"10.130.0.19/23","gateway_ip":"10.130.0.1"}}'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "10.130.0.19"
          ],
          "mac": "0a:58:0a:82:00:13",
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "10.130.0.19"
          ],
          "mac": "0a:58:0a:82:00:13",
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: restricted-v2
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2023-12-19T07:26:16Z"
  generateName: curator-job-zzp4l-
  labels:
    controller-uid: a4a9a71c-4884-4235-9ca4-430c184afa34
    job-name: curator-job-zzp4l
  name: curator-job-zzp4l-dvgxr
  namespace: <clustername>
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: curator-job-zzp4l
    uid: a4a9a71c-4884-4235-9ca4-430c184afa34
  resourceVersion: "986784"
  uid: 019429e9-884e-4953-aa60-46cc28c4b7a7
spec:
  containers:
  - command:
    - ./curator
    - done
    - <clustername>
    image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3
    imagePullPolicy: IfNotPresent
    name: done
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000710000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-nsjkl
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: cluster-installer-dockercfg-fhp27
  initContainers:
  - command:
    - ./curator
    - upgrade-cluster
    - <clustername>
    image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3
    imagePullPolicy: Always
    name: upgrade-cluster
    resources:
      limits:
        cpu: 2m
        memory: 45Mi
      requests:
        cpu: 1m
        memory: 30Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000710000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-nsjkl
      readOnly: true
  - command:
    - ./curator
    - monitor-upgrade
    - <clustername>
    image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3
    imagePullPolicy: Always
    name: monitor-upgrade
    resources:
      limits:
        cpu: 2m
        memory: 45Mi
      requests:
        cpu: 1m
        memory: 30Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000710000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-nsjkl
      readOnly: true
  nodeName: <clustername>-wkr03
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000710000
    seLinuxOptions:
      level: s0:c27,c4
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: cluster-installer
  serviceAccountName: cluster-installer
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - name: kube-api-access-nsjkl
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-12-19T07:26:16Z"
    message: 'containers with incomplete status: [upgrade-cluster monitor-upgrade]'
    reason: ContainersNotInitialized
    status: "False"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-12-19T07:26:16Z"
    reason: PodFailed
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-12-19T07:26:16Z"
    reason: PodFailed
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-12-19T07:26:16Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3
    imageID: ""
    lastState: {}
    name: done
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        reason: PodInitializing
  hostIP: 10.110.51.109
  initContainerStatuses:
  - containerID: cri-o://69c645182cd74d5e8b133ee364c779b83c864c6129bab902c8411eda7cc0e8ac
    image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3
    imageID: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:2d89baaed1c8ce6121d26e9d199161158db5fbe8ae27d2c782950a1b57902fe3
    lastState: {}
    name: upgrade-cluster
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://69c645182cd74d5e8b133ee364c779b83c864c6129bab902c8411eda7cc0e8ac
        exitCode: 2
        finishedAt: "2023-12-19T07:26:50Z"
        reason: Error
        startedAt: "2023-12-19T07:26:45Z"
  - image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3
    imageID: ""
    lastState: {}
    name: monitor-upgrade
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: PodInitializing
  phase: Failed
  podIP: 10.130.0.19
  podIPs:
  - ip: 10.130.0.19
  qosClass: Burstable
  startTime: "2023-12-19T07:26:16Z" 




$ oc logs curator-job-q289c-57dzd -c upgrade-cluster
I1219 15:11:59.254388       1 curator.go:86] Mode: upgrade-cluster Cluster
I1219 15:11:59.757491       1 curator.go:111] Found clusterCurator resource "<clustername>" ✓
E1219 15:12:00.655803       1 helpers.go:99] hostedclusters.hypershift.openshift.io "<clustername>" not found
panic: hostedclusters.hypershift.openshift.io "<clustername>" not found [recovered]
        panic: hostedclusters.hypershift.openshift.io "<clustername>" not found

goroutine 1 [running]:
main.curatorRun.func1()
        /remote-source/app/cmd/curator/curator.go:150 +0x259
panic({0x18646c0, 0xc0000ced20})
        /usr/lib/golang/src/runtime/panic.go:884 +0x213
github.com/stolostron/cluster-curator-controller/pkg/jobs/utils.CheckError({0x1ca5820, 0xc0000ced20})
        /remote-source/app/pkg/jobs/utils/helpers.go:100 +0xd4
main.curatorRun(0x0?, {0x1cc2fe0, 0xc000192900}, {0x7fff3bd65e1d, 0xc}, {0xc000640070, 0xc})
        /remote-source/app/cmd/curator/curator.go:396 +0x1b65
main.main()
        /remote-source/app/cmd/curator/curator.go:66 +0x1c5



$ oc logs curator-job-q289c-57dzd -c monitor-upgrade
Error from server (BadRequest): container "monitor-upgrade" in pod "curator-job-q289c-57dzd" is waiting to start: PodInitializing

Expected results:

Managed Cluster upgraded

Additional info:

Secret to import cluster

apiVersion: v1
kind: Secret
metadata:
  name: auto-import-secret
  namespace: {{ ocp_cluster_name }}
stringData:
  autoImportRetry: "5"
  token: {{ user_token.stdout }}
  server: https://api.{{ ocp_cluster_name }}.{{ basedomain }}:6443
type: Opaque

is cloned by

ACM-9439 Unable to upgrade managed cluster from ACM 2.10

Closed

links to

RHSA-2024:126795 Red Hat Advanced Cluster Management 2.9.3 security and bug fix container updates

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates