Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-9145

Unable to upgrade managed cluster from ACM

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • ACM 2.9.3
    • ACM 2.9.0, ACM 2.9.1, MCE 2.4.2
    • Cluster Lifecycle
    • False
    • None
    • False
    • No

      Description of problem:

      I'm trying to upgrade managed clusters from ACM GUI in a completely air-gapped environment. We already have everything ready to upgrade them using OSUS and we have upgrades available from Hub cluster. If we try to upgrade them directly without launch it from ACM, it'll work.However, when we try to change the channel or launch the upgrade from ACM GUI, nothing happens. If we look at clustercurator resource, we'll find this error msg: Job_failed -> Desiredcuration: Upgrade version (4.12.40; Failed -hostedclusters.hypershift.openshift.io "clustername" not found.

      Version-Release number of selected component (if applicable):

      ACM 2.9.0

      MCE: 2.4.2

      How reproducible:

      Steps to Reproduce:

      1. Import a cluster using auto-import-secret
      2. Try to change channel or upgrade managed cluster from ACM

      Actual results:

      $ oc get clustercurator -n <clustername>
      No resources found in <clustername> namespace.
      
      $ oc get clustercurator -A
      No resources found
      
      $ oc get pods -n <clustername>
      NAME                      READY   STATUS       RESTARTS   AGE
      curator-job-zzp4l-dvgxr   0/1     Init:Error   0          97s
      
      $ oc logs curator-job-zzp4l-dvgxr  -n <clustername>
      Defaulted container "done" out of: done, upgrade-cluster (init), monitor-upgrade (init)
      Error from server (BadRequest): container "done" in pod "curator-job-zzp4l-dvgxr" is waiting to start: PodInitializing
      
      ]$ oc get clustercurator <clustername> -n <clustername> -o yaml
      apiVersion: cluster.open-cluster-management.io/v1beta1
      kind: ClusterCurator
      metadata:
        creationTimestamp: "2023-12-19T07:26:16Z"
        generation: 6
        name: <clustername>
        namespace: <clustername>
        resourceVersion: "986711"
        uid: d4d3a3ad-6c01-4418-8c8c-3b3017e4353c
      spec:
        desiredCuration: upgrade
        destroy:
          jobMonitorTimeout: 5
        install:
          jobMonitorTimeout: 5
        scale:
          jobMonitorTimeout: 5
        upgrade:
          channel: eus-4.12
          monitorTimeout: 120
      status:
        conditions:
        - lastTransitionTime: "2023-12-19T07:26:49Z"
          message: 'curator-job-zzp4l DesiredCuration: upgrade Version (;eus-4.12;) Failed
            - hostedclusters.hypershift.openshift.io "<clustername>" not found'
          reason: Job_failed
          status: "True"
          type: clustercurator-job
        - lastTransitionTime: "2023-12-19T07:26:49Z"
          message: Executing init container upgrade-cluster
          reason: Job_has_finished
          status: "False"
          type: upgrade-cluster
      
      
      
      
      $ oc get pod curator-job-zzp4l-dvgxr -o yaml
      apiVersion: v1
      kind: Pod
      metadata:
        annotations:
          k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.130.0.19/23"],"mac_address":"0a:58:0a:82:00:13","gateway_ips":["10.130.0.1"],"ip_address":"10.130.0.19/23","gateway_ip":"10.130.0.1"}}'
          k8s.v1.cni.cncf.io/network-status: |-
            [{
                "name": "ovn-kubernetes",
                "interface": "eth0",
                "ips": [
                    "10.130.0.19"
                ],
                "mac": "0a:58:0a:82:00:13",
                "default": true,
                "dns": {}
            }]
          k8s.v1.cni.cncf.io/networks-status: |-
            [{
                "name": "ovn-kubernetes",
                "interface": "eth0",
                "ips": [
                    "10.130.0.19"
                ],
                "mac": "0a:58:0a:82:00:13",
                "default": true,
                "dns": {}
            }]
          openshift.io/scc: restricted-v2
          seccomp.security.alpha.kubernetes.io/pod: runtime/default
        creationTimestamp: "2023-12-19T07:26:16Z"
        generateName: curator-job-zzp4l-
        labels:
          controller-uid: a4a9a71c-4884-4235-9ca4-430c184afa34
          job-name: curator-job-zzp4l
        name: curator-job-zzp4l-dvgxr
        namespace: <clustername>
        ownerReferences:
        - apiVersion: batch/v1
          blockOwnerDeletion: true
          controller: true
          kind: Job
          name: curator-job-zzp4l
          uid: a4a9a71c-4884-4235-9ca4-430c184afa34
        resourceVersion: "986784"
        uid: 019429e9-884e-4953-aa60-46cc28c4b7a7
      spec:
        containers:
        - command:
          - ./curator
          - done
          - <clustername>
          image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3
          imagePullPolicy: IfNotPresent
          name: done
          resources: {}
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            runAsNonRoot: true
            runAsUser: 1000710000
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
            name: kube-api-access-nsjkl
            readOnly: true
        dnsPolicy: ClusterFirst
        enableServiceLinks: true
        imagePullSecrets:
        - name: cluster-installer-dockercfg-fhp27
        initContainers:
        - command:
          - ./curator
          - upgrade-cluster
          - <clustername>
          image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3
          imagePullPolicy: Always
          name: upgrade-cluster
          resources:
            limits:
              cpu: 2m
              memory: 45Mi
            requests:
              cpu: 1m
              memory: 30Mi
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            runAsNonRoot: true
            runAsUser: 1000710000
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
            name: kube-api-access-nsjkl
            readOnly: true
        - command:
          - ./curator
          - monitor-upgrade
          - <clustername>
          image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3
          imagePullPolicy: Always
          name: monitor-upgrade
          resources:
            limits:
              cpu: 2m
              memory: 45Mi
            requests:
              cpu: 1m
              memory: 30Mi
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            runAsNonRoot: true
            runAsUser: 1000710000
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
            name: kube-api-access-nsjkl
            readOnly: true
        nodeName: <clustername>-wkr03
        preemptionPolicy: PreemptLowerPriority
        priority: 0
        restartPolicy: Never
        schedulerName: default-scheduler
        securityContext:
          fsGroup: 1000710000
          seLinuxOptions:
            level: s0:c27,c4
          seccompProfile:
            type: RuntimeDefault
        serviceAccount: cluster-installer
        serviceAccountName: cluster-installer
        terminationGracePeriodSeconds: 30
        tolerations:
        - effect: NoExecute
          key: node.kubernetes.io/not-ready
          operator: Exists
          tolerationSeconds: 300
        - effect: NoExecute
          key: node.kubernetes.io/unreachable
          operator: Exists
          tolerationSeconds: 300
        - effect: NoSchedule
          key: node.kubernetes.io/memory-pressure
          operator: Exists
        volumes:
        - name: kube-api-access-nsjkl
          projected:
            defaultMode: 420
            sources:
            - serviceAccountToken:
                expirationSeconds: 3607
                path: token
            - configMap:
                items:
                - key: ca.crt
                  path: ca.crt
                name: kube-root-ca.crt
            - downwardAPI:
                items:
                - fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
                  path: namespace
            - configMap:
                items:
                - key: service-ca.crt
                  path: service-ca.crt
                name: openshift-service-ca.crt
      status:
        conditions:
        - lastProbeTime: null
          lastTransitionTime: "2023-12-19T07:26:16Z"
          message: 'containers with incomplete status: [upgrade-cluster monitor-upgrade]'
          reason: ContainersNotInitialized
          status: "False"
          type: Initialized
        - lastProbeTime: null
          lastTransitionTime: "2023-12-19T07:26:16Z"
          reason: PodFailed
          status: "False"
          type: Ready
        - lastProbeTime: null
          lastTransitionTime: "2023-12-19T07:26:16Z"
          reason: PodFailed
          status: "False"
          type: ContainersReady
        - lastProbeTime: null
          lastTransitionTime: "2023-12-19T07:26:16Z"
          status: "True"
          type: PodScheduled
        containerStatuses:
        - image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3
          imageID: ""
          lastState: {}
          name: done
          ready: false
          restartCount: 0
          started: false
          state:
            waiting:
              reason: PodInitializing
        hostIP: 10.110.51.109
        initContainerStatuses:
        - containerID: cri-o://69c645182cd74d5e8b133ee364c779b83c864c6129bab902c8411eda7cc0e8ac
          image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3
          imageID: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:2d89baaed1c8ce6121d26e9d199161158db5fbe8ae27d2c782950a1b57902fe3
          lastState: {}
          name: upgrade-cluster
          ready: false
          restartCount: 0
          state:
            terminated:
              containerID: cri-o://69c645182cd74d5e8b133ee364c779b83c864c6129bab902c8411eda7cc0e8ac
              exitCode: 2
              finishedAt: "2023-12-19T07:26:50Z"
              reason: Error
              startedAt: "2023-12-19T07:26:45Z"
        - image: registry.redhat.io/multicluster-engine/cluster-curator-controller-rhel8@sha256:388eb1c6285d4cf4df00a2946c15123d9f0548d6d587edac14681d8bb66a6fe3
          imageID: ""
          lastState: {}
          name: monitor-upgrade
          ready: false
          restartCount: 0
          state:
            waiting:
              reason: PodInitializing
        phase: Failed
        podIP: 10.130.0.19
        podIPs:
        - ip: 10.130.0.19
        qosClass: Burstable
        startTime: "2023-12-19T07:26:16Z" 
      
      
      
      
      $ oc logs curator-job-q289c-57dzd -c upgrade-cluster
      I1219 15:11:59.254388       1 curator.go:86] Mode: upgrade-cluster Cluster
      I1219 15:11:59.757491       1 curator.go:111] Found clusterCurator resource "<clustername>" ✓
      E1219 15:12:00.655803       1 helpers.go:99] hostedclusters.hypershift.openshift.io "<clustername>" not found
      panic: hostedclusters.hypershift.openshift.io "<clustername>" not found [recovered]
              panic: hostedclusters.hypershift.openshift.io "<clustername>" not found
      
      goroutine 1 [running]:
      main.curatorRun.func1()
              /remote-source/app/cmd/curator/curator.go:150 +0x259
      panic({0x18646c0, 0xc0000ced20})
              /usr/lib/golang/src/runtime/panic.go:884 +0x213
      github.com/stolostron/cluster-curator-controller/pkg/jobs/utils.CheckError({0x1ca5820, 0xc0000ced20})
              /remote-source/app/pkg/jobs/utils/helpers.go:100 +0xd4
      main.curatorRun(0x0?, {0x1cc2fe0, 0xc000192900}, {0x7fff3bd65e1d, 0xc}, {0xc000640070, 0xc})
              /remote-source/app/cmd/curator/curator.go:396 +0x1b65
      main.main()
              /remote-source/app/cmd/curator/curator.go:66 +0x1c5
      
      
      
      $ oc logs curator-job-q289c-57dzd -c monitor-upgrade
      Error from server (BadRequest): container "monitor-upgrade" in pod "curator-job-q289c-57dzd" is waiting to start: PodInitializing

      Expected results:

      Managed Cluster upgraded

      Additional info:

      Secret to import cluster

      apiVersion: v1
      kind: Secret
      metadata:
        name: auto-import-secret
        namespace: {{ ocp_cluster_name }}
      stringData:
        autoImportRetry: "5"
        token: {{ user_token.stdout }}
        server: https://api.{{ ocp_cluster_name }}.{{ basedomain }}:6443
      type: Opaque 

            fxiang@redhat.com Feng Xiang
            rh-ee-agarciac Adonis Garcia Castro
            Atif Shafi Atif Shafi
            Votes:
            3 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: