-
Bug
-
Resolution: Cannot Reproduce
-
Normal
-
None
-
4.10
-
Moderate
-
None
-
Unspecified
-
If docs needed, set a value
-
Description of problem:
While applying the du/ran profile to 100 SNOs, 2 SNOs failed to become Compliant because the redhat-operators pod was stuck in ContainerCreating.
SNO00039 and SNO00074 both failed. SNO00039 was corrected post gathering journal logs by deleting the redhat-operators pod in openshift-marketplace NS.
Version-Release number of selected component (if applicable):
Hub OCP - 4.9.23
ACM - 2.5.0-DOWNSTREAM-2022-02-21-19-58-55
SNO OCP - 4.10.0-rc.5
How reproducible:
2 / 100 SNOs had this failure
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
redhat-operators pod to be running and common-subscriptions-policy to become compliant because disconnected operators are available to be installed.
Additional info:
Deleting the redhat-operators pod resolves the issue as the replacement pod comes up running. Running amust-gather against the affected clusters also works and creates/runs/terminates a pod so it seems only to be this specific pod is stuck in container creating.
From SNO00074:
policies:
- oc get policy -n sno00074
NAME REMEDIATION ACTION COMPLIANCE STATE AGE
ztp-common.common-config-policy inform Compliant 12h
ztp-common.common-subscriptions-policy inform NonCompliant 12h
ztp-group.group-du-sno-config-log-policy inform NonCompliant 12h
ztp-group.group-du-sno-config-policy inform NonCompliant 12h
ztp-group.group-du-sno-config-storage-policy inform NonCompliant 12h
ztp-install.sno00074-common-config-policy enforce Compliant 12h
ztp-install.sno00074-common-subscriptions-policy enforce NonCompliant 12h
Subscriptions:
- oc --kubeconfig=/root/hv-sno/manifests/sno00074/kubeconfig get subs -A
NAMESPACE NAME PACKAGE SOURCE CHANNEL
openshift-local-storage local-storage-operator local-storage-operator redhat-operators 4.9
openshift-logging cluster-logging cluster-logging redhat-operators stable
openshift-performance-addon-operator performance-addon-operator performance-addon-operator redhat-operators 4.9
openshift-ptp ptp-operator-subscription ptp-operator redhat-operators 4.9
openshift-sriov-network-operator sriov-network-operator-subscription sriov-network-operator redhat-operators 4.9
- oc --kubeconfig=/root/hv-sno/manifests/sno00074/kubeconfig describe sub -n openshift-sriov-network-operator sriov-network-operator-subscription
...
Status:
Catalog Health:
Catalog Source Ref:
API Version: operators.coreos.com/v1alpha1
Kind: CatalogSource
Name: redhat-operators
Namespace: openshift-marketplace
Resource Version: 63258
UID: 29357969-3c93-4128-a3e8-da969dacb0bd
Healthy: true
Last Updated: 2022-03-03T02:33:18Z
Conditions:
Last Transition Time: 2022-03-03T02:33:18Z
Message: all available catalogsources are healthy
Reason: AllCatalogSourcesHealthy
Status: False
Type: CatalogSourcesUnhealthy
Message: error using catalog redhat-operators (in namespace openshift-marketplace): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp [fd02::39bf]:50051: connect: connection refused"
Reason: ErrorPreventedResolution
Status: True
Type: ResolutionFailed
Last Updated: 2022-03-03T02:33:20Z
- oc --kubeconfig=/root/hv-sno/manifests/sno00074/kubeconfig get po -n openshift-marketplace
NAME READY STATUS RESTARTS AGE
marketplace-operator-5d47b5d9f9-lfzcg 1/1 Running 0 14h
redhat-operators-kc8dm 0/1 ContainerCreating 0 12h
- oc --kubeconfig=/root/hv-sno/manifests/sno00074/kubeconfig describe po -n openshift-marketplace redhat-operators-kc8dm
Name: redhat-operators-kc8dm
Namespace: openshift-marketplace
Priority: 0
Node: sno00074/fc00:1000::431
Start Time: Thu, 03 Mar 2022 02:27:38 +0000
Labels: olm.catalogSource=redhat-operators
olm.pod-spec-hash=5cb95f87d6
Annotations: k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["fd01:0:0:1::70/64"],"mac_address":"0a:58:b0:dc:98:c0","gateway_ips":["fd01:0:0:1::1"],"ip_address":"fd01:0:0:...
openshift.io/scc: privileged
ran.openshift.io/ztp-deploy-wave: 1
Status: Pending
IP:
IPs: <none>
Containers:
registry-server:
Container ID:
Image: f04-h17-b07-5039ms.rdu2.scalelab.redhat.com:5000/olm-mirror/redhat-operator-index:v4.9
Image ID:
Port: 50051/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 10m
memory: 50Mi
Liveness: exec [grpc_health_probe -addr=:50051] delay=10s timeout=5s period=10s #success=1 #failure=3
Readiness: exec [grpc_health_probe -addr=:50051] delay=5s timeout=5s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kl5ff (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-kl5ff:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message-
-
- ------ ---- ---- -------
Warning FailedMount 9m48s (x352 over 12h) kubelet Unable to attach or mount volumes: unmounted volumes=[kube-api-access-kl5ff], unattached volumes=[kube-api-access-kl5ff]: timed out waiting for the condition
Warning FailedMount 4m45s (x367 over 12h) kubelet MountVolume.SetUp failed for volume "kube-api-access-kl5ff" : failed to fetch token: serviceaccounts "redhat-operators" not found
- ------ ---- ---- -------
-
-