Uploaded image for project: 'OPCT - OpenShift Provider Compatibility Tool'
  1. OPCT - OpenShift Provider Compatibility Tool
  2. OPCT-39

[CLI] CLI must detect when plugins workload/pod is in failed state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • CLI

      The CLI must check if the pods are healthy after a while. It's not clear if Sonobuoy will timeout, but when the plugin image, for some reason, is not reachable, a better message on the CLI must be shown.

       

      In the example bellow the image was not available (should not happen in production), but the CLI keeps trying to download it:

      $ ./openshift-provider-cert-linux-amd64-process0 run -w
      INFO[2023-02-02T18:22:05-03:00] Ensuring proper node label for dedicated mode exists 
      INFO[2023-02-02T18:22:06-03:00] Ensuring the tool will run in the privileged environment... 
      INFO[2023-02-02T18:22:06-03:00] Created opct-scc-privileged ClusterRole      
      INFO[2023-02-02T18:22:06-03:00] Created opct-scc-privileged ClusterRoleBinding 
      INFO[2023-02-02T18:22:06-03:00] Running OpenShift Provider Certification Tool... 
      INFO[2023-02-02T18:22:07-03:00] object already exists                         name=openshift-provider-certification namespace= resource=namespaces
      INFO[2023-02-02T18:22:07-03:00] create request issued                         name=sonobuoy-config-cm namespace=openshift-provider-certification resource=configmaps
      INFO[2023-02-02T18:22:07-03:00] create request issued                         name=sonobuoy-plugins-cm namespace=openshift-provider-certification resource=configmaps
      INFO[2023-02-02T18:22:07-03:00] create request issued                         name=sonobuoy namespace=openshift-provider-certification resource=pods
      INFO[2023-02-02T18:22:08-03:00] create request issued                         name=sonobuoy-aggregator namespace=openshift-provider-certification resource=services
      INFO[2023-02-02T18:22:08-03:00] Jobs scheduled! Waiting for resources be created... 
      Thu, 02 Feb 2023 18:22:21 -03> Global Status: running
      JOB_NAME                           | STATUS     | RESULTS    | PROGRESS                  | MESSAGE                                           
      05-openshift-cluster-upgrade       | running    |            |                           |                                                   
      10-openshift-kube-conformance      | running    |            |                           |                                                   
      20-openshift-conformance-validated | running    |            |                           |                                                   
      99-openshift-artifacts-collector   | running    |            |                           |                                                   
      Thu, 02 Feb 2023 18:22:32 -03> Global Status: running
      JOB_NAME                           | STATUS     | RESULTS    | PROGRESS                  | MESSAGE                                           
      05-openshift-cluster-upgrade       | running    |            |                           |                                                   
      10-openshift-kube-conformance      | running    |            |                           |                                                   
      20-openshift-conformance-validated | running    |            |                           |                                                   
      99-openshift-artifacts-collector   | running    |            |                           |                                       
      (...)

      Pods:

      $ oc get pods -n openshift-provider-certification -o wide
      NAME                                                               READY   STATUS             RESTARTS   AGE   IP           NODE                         NOMINATED NODE   READINESS GATES
      sonobuoy                                                           1/1     Running            0          42s   10.131.2.5   ip-10-0-57-61.ec2.internal   <none>           <none>
      sonobuoy-05-openshift-cluster-upgrade-job-60d3274e82e04d28         1/3     ImagePullBackOff   0          39s   10.131.2.8   ip-10-0-57-61.ec2.internal   <none>           <none>
      sonobuoy-10-openshift-kube-conformance-job-b13d7ca45caf485b        1/3     ImagePullBackOff   0          39s   10.131.2.6   ip-10-0-57-61.ec2.internal   <none>           <none>
      sonobuoy-20-openshift-conformance-validated-job-85c75be8d1c94818   1/3     ImagePullBackOff   0          39s   10.131.2.9   ip-10-0-57-61.ec2.internal   <none>           <none>
      sonobuoy-99-openshift-artifacts-collector-job-fad442ab3e124582     1/3     ImagePullBackOff   0          39s   10.131.2.7   ip-10-0-57-61.ec2.internal   <none>           <none>
       

      Pod:

      $ oc describe pod -n openshift-provider-certification  sonobuoy-05-openshift-cluster-upgrade-job-60d3274e82e04d28
      
      Events:
        Type     Reason          Age                From               Message
        ----     ------          ----               ----               -------
        Normal   Scheduled       60s                default-scheduler  Successfully assigned openshift-provider-certification/sonobuoy-05-openshift-cluster-upgrade-job-60d3274e82e04d28 to ip-10-0-57-61.ec2.internal
        Normal   AddedInterface  60s                multus             Add eth0 [10.131.2.8/23] from ovn-kubernetes
        Normal   Created         59s                kubelet            Created container sonobuoy-worker
        Normal   Started         59s                kubelet            Started container sonobuoy-worker
        Normal   Pulled          59s                kubelet            Container image "quay.io/ocp-cert/sonobuoy:v0.56.10" already present on machine
        Warning  Failed          43s (x2 over 59s)  kubelet            Error: ErrImagePull
        Normal   Pulling         43s (x2 over 59s)  kubelet            Pulling image "quay.io/ocp-cert/openshift-tests-provider-cert:dev20230127205105"
        Warning  Failed          43s (x2 over 59s)  kubelet            Failed to pull image "quay.io/ocp-cert/openshift-tests-provider-cert:dev20230127205105": rpc error: code = Unknown desc = reading manifest dev20230127205105 in quay.io/ocp-cert/openshift-tests-provider-cert: manifest unknown: manifest unknown
        Normal   BackOff         32s (x5 over 59s)  kubelet            Back-off pulling image "quay.io/ocp-cert/openshift-tests-provider-cert:dev20230127205105"
        Warning  Failed          32s (x5 over 59s)  kubelet            Error: ImagePullBackOff
        Normal   BackOff         32s (x3 over 59s)  kubelet            Back-off pulling image "quay.io/ocp-cert/openshift-tests-provider-cert:dev20230127205105"
        Warning  Failed          32s (x3 over 59s)  kubelet            Error: ImagePullBackOff
       

      One idea could be displaying the pod status while there's no message(empty field) on the plugin payload.

            Unassigned Unassigned
            rhn-support-mrbraga Marco Braga
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: