Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9146

DU profile fails to rollout and apply operators because redhat-operators pod is stuck in ContainerCreating

XMLWordPrintable

    • Moderate
    • None
    • Unspecified
    • If docs needed, set a value

      Description of problem:
      While applying the du/ran profile to 100 SNOs, 2 SNOs failed to become Compliant because the redhat-operators pod was stuck in ContainerCreating.

      SNO00039 and SNO00074 both failed. SNO00039 was corrected post gathering journal logs by deleting the redhat-operators pod in openshift-marketplace NS.

      Version-Release number of selected component (if applicable):
      Hub OCP - 4.9.23
      ACM - 2.5.0-DOWNSTREAM-2022-02-21-19-58-55
      SNO OCP - 4.10.0-rc.5

      How reproducible:
      2 / 100 SNOs had this failure

      Steps to Reproduce:
      1.
      2.
      3.

      Actual results:

      Expected results:
      redhat-operators pod to be running and common-subscriptions-policy to become compliant because disconnected operators are available to be installed.

      Additional info:

      Deleting the redhat-operators pod resolves the issue as the replacement pod comes up running. Running amust-gather against the affected clusters also works and creates/runs/terminates a pod so it seems only to be this specific pod is stuck in container creating.

      From SNO00074:

      policies:

      1. oc get policy -n sno00074
        NAME REMEDIATION ACTION COMPLIANCE STATE AGE
        ztp-common.common-config-policy inform Compliant 12h
        ztp-common.common-subscriptions-policy inform NonCompliant 12h
        ztp-group.group-du-sno-config-log-policy inform NonCompliant 12h
        ztp-group.group-du-sno-config-policy inform NonCompliant 12h
        ztp-group.group-du-sno-config-storage-policy inform NonCompliant 12h
        ztp-install.sno00074-common-config-policy enforce Compliant 12h
        ztp-install.sno00074-common-subscriptions-policy enforce NonCompliant 12h

      Subscriptions:

      1. oc --kubeconfig=/root/hv-sno/manifests/sno00074/kubeconfig get subs -A
        NAMESPACE NAME PACKAGE SOURCE CHANNEL
        openshift-local-storage local-storage-operator local-storage-operator redhat-operators 4.9
        openshift-logging cluster-logging cluster-logging redhat-operators stable
        openshift-performance-addon-operator performance-addon-operator performance-addon-operator redhat-operators 4.9
        openshift-ptp ptp-operator-subscription ptp-operator redhat-operators 4.9
        openshift-sriov-network-operator sriov-network-operator-subscription sriov-network-operator redhat-operators 4.9
      1. oc --kubeconfig=/root/hv-sno/manifests/sno00074/kubeconfig describe sub -n openshift-sriov-network-operator sriov-network-operator-subscription
        ...
        Status:
        Catalog Health:
        Catalog Source Ref:
        API Version: operators.coreos.com/v1alpha1
        Kind: CatalogSource
        Name: redhat-operators
        Namespace: openshift-marketplace
        Resource Version: 63258
        UID: 29357969-3c93-4128-a3e8-da969dacb0bd
        Healthy: true
        Last Updated: 2022-03-03T02:33:18Z
        Conditions:
        Last Transition Time: 2022-03-03T02:33:18Z
        Message: all available catalogsources are healthy
        Reason: AllCatalogSourcesHealthy
        Status: False
        Type: CatalogSourcesUnhealthy
        Message: error using catalog redhat-operators (in namespace openshift-marketplace): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp [fd02::39bf]:50051: connect: connection refused"
        Reason: ErrorPreventedResolution
        Status: True
        Type: ResolutionFailed
        Last Updated: 2022-03-03T02:33:20Z
      1. oc --kubeconfig=/root/hv-sno/manifests/sno00074/kubeconfig get po -n openshift-marketplace
        NAME READY STATUS RESTARTS AGE
        marketplace-operator-5d47b5d9f9-lfzcg 1/1 Running 0 14h
        redhat-operators-kc8dm 0/1 ContainerCreating 0 12h
      1. oc --kubeconfig=/root/hv-sno/manifests/sno00074/kubeconfig describe po -n openshift-marketplace redhat-operators-kc8dm
        Name: redhat-operators-kc8dm
        Namespace: openshift-marketplace
        Priority: 0
        Node: sno00074/fc00:1000::431
        Start Time: Thu, 03 Mar 2022 02:27:38 +0000
        Labels: olm.catalogSource=redhat-operators
        olm.pod-spec-hash=5cb95f87d6
        Annotations: k8s.ovn.org/pod-networks:
        {"default":{"ip_addresses":["fd01:0:0:1::70/64"],"mac_address":"0a:58:b0:dc:98:c0","gateway_ips":["fd01:0:0:1::1"],"ip_address":"fd01:0:0:...
        openshift.io/scc: privileged
        ran.openshift.io/ztp-deploy-wave: 1
        Status: Pending
        IP:
        IPs: <none>
        Containers:
        registry-server:
        Container ID:
        Image: f04-h17-b07-5039ms.rdu2.scalelab.redhat.com:5000/olm-mirror/redhat-operator-index:v4.9
        Image ID:
        Port: 50051/TCP
        Host Port: 0/TCP
        State: Waiting
        Reason: ContainerCreating
        Ready: False
        Restart Count: 0
        Requests:
        cpu: 10m
        memory: 50Mi
        Liveness: exec [grpc_health_probe -addr=:50051] delay=10s timeout=5s period=10s #success=1 #failure=3
        Readiness: exec [grpc_health_probe -addr=:50051] delay=5s timeout=5s period=10s #success=1 #failure=3
        Environment: <none>
        Mounts:
        /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kl5ff (ro)
        Conditions:
        Type Status
        Initialized True
        Ready False
        ContainersReady False
        PodScheduled True
        Volumes:
        kube-api-access-kl5ff:
        Type: Projected (a volume that contains injected data from multiple sources)
        TokenExpirationSeconds: 3607
        ConfigMapName: kube-root-ca.crt
        ConfigMapOptional: <nil>
        DownwardAPI: true
        ConfigMapName: openshift-service-ca.crt
        ConfigMapOptional: <nil>
        QoS Class: Burstable
        Node-Selectors: kubernetes.io/os=linux
        Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
        node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
        node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
        Events:
        Type Reason Age From Message
            • ------ ---- ---- -------
              Warning FailedMount 9m48s (x352 over 12h) kubelet Unable to attach or mount volumes: unmounted volumes=[kube-api-access-kl5ff], unattached volumes=[kube-api-access-kl5ff]: timed out waiting for the condition
              Warning FailedMount 4m45s (x367 over 12h) kubelet MountVolume.SetUp failed for volume "kube-api-access-kl5ff" : failed to fetch token: serviceaccounts "redhat-operators" not found

              fromani@redhat.com Francesco Romani
              akrzos@redhat.com Alex Krzos
              Walid Abouhamad Walid Abouhamad
              Red Hat Employee
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: