Loading...

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: None
Affects Version/s: MCE 2.6.1
Component/s: DevOps, Server Foundation
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

Severity:
Critical

Regression:
None

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Description of problem:

After creating a new hive cluster on MCE 2.6.1, the cluster stays stuck in Importing status.
build: 2.6.1-DOWNANDBACK-2024-07-19-15-41-16

We see the following error in the import controller log on the mce hub:

2024-07-23T12:46:05.438218058Z	INFO	manifestwork-controller	Reconciling the manifest works of the managed cluster	{"Request.Name": "clc-az-1721664639215"}
2024-07-23T12:46:05.501413934Z	ERROR	Reconciler error	{"controller": "manifestwork-controller", "namespace": "", "name": "clc-az-1721664639215", "reconcileID": "962bed92-9a7f-41d0-8b4a-053a79d00609", "error": "manifestworks.work.open-cluster-management.io \"clc-az-1721664639215-klusterlet-crds\" already exists", "errorCauses": [{"error": "manifestworks.work.open-cluster-management.io \"clc-az-1721664639215-klusterlet-crds\" already exists"}]}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227
2024-07-23T12:46:05.501523021Z	INFO	manifestwork-controller	Reconciling the manifest works of the managed cluster	{"Request.Name": "clc-az-1721664639215"}
2024-07-23T12:46:05.508745918Z	INFO	importconfig-controller	Reconciling managed cluster	{"Request.Name": "clc-az-1721664639215"}
2024-07-23T12:46:05.522439217Z	INFO	manifestwork-controller	Reconciling the manifest works of the managed cluster	{"Request.Name": "clc-az-1721664639215"}
I0723 12:46:05.551293       1 event.go:364] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"multicluster-engine", Name:"managedcluster-import-controller-v2", UID:"c325d438-2f26-4a92-973f-51743d98fb94", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/bootstrap-hub-kubeconfig -n open-cluster-management-agent because it changed

on the managed cluster, I can see the klusterlet pod crashed:

oc get pods -n open-cluster-management-agent
NAME                          READY   STATUS             RESTARTS   AGE
klusterlet-54b6cc6bcd-x67vp   0/1     ImagePullBackOff   0          19h

pod:

Name:             klusterlet-54b6cc6bcd-x67vp
Namespace:        open-cluster-management-agent
Priority:         0
Service Account:  klusterlet
Node:             clc-az-1721664639215-zkm5f-worker-eastus3-qmph8/10.0.128.4
Start Time:       Mon, 22 Jul 2024 10:09:58 -0700
Labels:           app=klusterlet
                  pod-template-hash=54b6cc6bcd
Annotations:      k8s.ovn.org/pod-networks:
                    {"default":{"ip_addresses":["10.133.2.13/23"],"mac_address":"0a:58:0a:85:02:0d","gateway_ips":["10.133.2.1"],"routes":[{"dest":"10.132.0.0...
                  k8s.v1.cni.cncf.io/network-status:
                    [{
                        "name": "ovn-kubernetes",
                        "interface": "eth0",
                        "ips": [
                            "10.133.2.13"
                        ],
                        "mac": "0a:58:0a:85:02:0d",
                        "default": true,
                        "dns": {}
                    }]
                  openshift.io/scc: restricted-v2
                  seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:           Pending
SeccompProfile:   RuntimeDefault
IP:               10.133.2.13
IPs:
  IP:           10.133.2.13
Controlled By:  ReplicaSet/klusterlet-54b6cc6bcd
Containers:
  klusterlet:
    Container ID:
    Image:         registry.redhat.io/multicluster-engine/registration-operator-rhel9@sha256:ace853fde03f1d417522cd47385f6fb78c82bd0a7aa2a7a3fb305c997896dedb
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      /registration-operator
      klusterlet
      --disable-leader-election
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  2Gi
    Requests:
      cpu:      50m
      memory:   64Mi
    Liveness:   http-get https://:8443/healthz delay=2s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get https://:8443/healthz delay=2s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:  klusterlet-54b6cc6bcd-x67vp (v1:metadata.name)
    Mounts:
      /tmp from tmpdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-45b7q (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  tmpdir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-45b7q:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node-role.kubernetes.io/infra:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                   From     Message
  ----     ------   ----                  ----     -------
  Warning  Failed   175m (x203 over 19h)  kubelet  Failed to pull image "registry.redhat.io/multicluster-engine/registration-operator-rhel9@sha256:ace853fde03f1d417522cd47385f6fb78c82bd0a7aa2a7a3fb305c997896dedb": reading manifest sha256:ace853fde03f1d417522cd47385f6fb78c82bd0a7aa2a7a3fb305c997896dedb in registry.redhat.io/multicluster-engine/registration-operator-rhel9: manifest unknown
  Normal   BackOff  31s (x5240 over 19h)  kubelet  Back-off pulling image "registry.redhat.io/multicluster-engine/registration-operator-rhel9@sha256:ace853fde03f1d417522cd47385f6fb78c82bd0a7aa2a7a3fb305c997896dedb"

ImagePullBackOff error on klusterlet during cluster registration

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Easy Agile Planning Poker

Activity

People

Dates