Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-11161

klusterlet-agent pod in CrashLoopBackOff after upgrading from 4.12.33 to 4.12.53

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • ACM 2.10.1
    • Server Foundation
    • 1
    • False
    • None
    • False
    • No
    • 2
    • SF Train-14 2024-02
    • Important

      Description of problem:

      After upgrading a VSphere cluster provisioned by ACM 2.10.1 from 4.12.33 to 4.12.53, a klusterlet-agent pod is in CrashLoopBackOff status in the open-cluster-management-agent namespace. The errors coming from the pod:

      $ oc logs -n open-cluster-management-agent klusterlet-agent-677ccd9b6c-gwnm8 
      Error: unknown flag: --workload-source-driver
      Usage:
        registration-operator agent [flags]Flags:
            --agent-id string                                      ID of the agent
            --appliedmanifestwork-eviction-grace-period duration   Grace period for appliedmanifestwork eviction (default 24h0m0s)
            --bootstrap-kubeconfig string                          The path of the kubeconfig file for agent bootstrap.
            --bootstrap-kubeconfig-secret string                   The name of secret in component namespace storing kubeconfig for agent bootstrap. (default "bootstrap-hub-kubeconfig")
            --client-cert-expiration-seconds int32                 The requested duration in seconds of validity of the issued client certificate. If this is not set, the value of --cluster-signing-duration command-line flag of the kube-controller-manager will be used.
            --cluster-annotations stringToString                   the annotations with the reserve
                                                                        prefix "agent.open-cluster-management.io" set on ManagedCluster when creating only, other actors can update it afterwards. (default [])
            --cluster-healthcheck-period duration                  The period to check managed cluster kube-apiserver health (default 1m0s)
            --cluster-name string                                  Name of the spoke cluster.
            --config string                                        Location of the master configuration file to run from.
            --disable-leader-election                              Disable leader election.
            --feature-gates mapStringBool                          A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
                                                                   AddonManagement=true|false (ALPHA - default=true)
                                                                   AllAlpha=true|false (ALPHA - default=false)
                                                                   AllBeta=true|false (BETA - default=false)
                                                                   ClusterClaim=true|false (BETA - default=true)
                                                                   ExecutorValidatingCaches=true|false (ALPHA - default=false)
                                                                   RawFeedbackJsonString=true|false (ALPHA - default=false)
                                                                   V1beta1CSRAPICompatibility=true|false (ALPHA - default=false)
        -h, --help                                                 help for agent
            --hub-kubeconfig string                                Location of kubeconfig file to connect to hub cluster.
            --hub-kubeconfig-dir string                            The mount path of hub-kubeconfig-secret in the container. (default "/spoke/hub-kubeconfig")
            --hub-kubeconfig-secret string                         The name of secret in component namespace storing kubeconfig for hub. (default "hub-kubeconfig-secret")
            --kube-api-burst int                                   Burst to use while talking with apiserver on spoke cluster. (default 100)
            --kube-api-qps float32                                 QPS to use while talking with apiserver on spoke cluster. (default 50)
            --kubeconfig string                                    Location of the master configuration file to run from.
            --leader-election-lease-duration duration              The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. (default 2m17s)
            --leader-election-renew-deadline duration              The interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than or equal to the lease duration. This is only applicable if leader election is enabled. (default 1m47s)
            --leader-election-retry-period duration                The duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled. (default 26s)
            --listen string                                        The ip:port to serve on.
            --max-custom-cluster-claims int                        The max number of custom cluster claims to expose. (default 20)
            --namespace string                                     Namespace where the controller is running. Auto-detected if run in cluster.
            --spoke-cluster-name string                            Name of the spoke cluster.
            --spoke-external-server-urls stringArray               A list of reachable spoke cluster api server URLs for hub cluster.
            --spoke-kubeconfig string                              Location of kubeconfig file to connect to spoke cluster. If this is not set, will use '--kubeconfig' to build client to connect to the managed cluster.
            --status-sync-interval duration                        Interval to sync resource status to hub. (default 10s)
            --terminate-on-files stringArray                       A list of files. If one of them changes, the process will terminate.Global Flags:
            --log-flush-frequency duration   Maximum number of seconds between log flushes (default 5s)
        -v, --v Level                        number for the log level verbosity
            --vmodule moduleSpec             comma-separated list of pattern=N settings for file-filtered logging (only works for the default text log format)unknown flag: --workload-source-driver
       

      Version-Release number of selected component (if applicable):

      ACM 2.10.1 Hub 

      OCP 4.12.33 > OCP 4.12.53 spoke

      How reproducible:

      1/1

      Steps to Reproduce:

      1. Upgrade from OCP 4.12.33 to OCP 4.12.53

      Actual results:

      Cluster has klusterlet agent pod in CrashLoopBackOff

      $ oc get pods -n open-cluster-management-agent
      NAME                                READY   STATUS             RESTARTS        AGE
      klusterlet-77c658d747-xjwnt         1/1     Running            0               40m
      klusterlet-agent-677ccd9b6c-gwnm8   0/1     CrashLoopBackOff   12 (103s ago)   38m
      klusterlet-agent-8474f5b79c-4zhbr   1/1     Running            0               40m
      klusterlet-agent-8474f5b79c-9bkns   1/1     Running            0               40m
      klusterlet-agent-8474f5b79c-jkw2j   1/1     Running            1 (37m ago)     40m 

      Expected results:

      All pods in open-cluster-management-agent namespace are running

            zxue@redhat.com ZHAO XUE
            treywest96 Trey West
            Hui Chen Hui Chen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: