Loading...

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: CLI
Labels:
- OPCT
- kind/bug
- opct-v0.6

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
OPCT v0.6 (proposal)
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

The CLI must check if the pods are healthy after a while. It's not clear if Sonobuoy will timeout, but when the plugin image, for some reason, is not reachable, a better message on the CLI must be shown.

In the example bellow the image was not available (should not happen in production), but the CLI keeps trying to download it:

$ ./openshift-provider-cert-linux-amd64-process0 run -w
INFO[2023-02-02T18:22:05-03:00] Ensuring proper node label for dedicated mode exists 
INFO[2023-02-02T18:22:06-03:00] Ensuring the tool will run in the privileged environment... 
INFO[2023-02-02T18:22:06-03:00] Created opct-scc-privileged ClusterRole      
INFO[2023-02-02T18:22:06-03:00] Created opct-scc-privileged ClusterRoleBinding 
INFO[2023-02-02T18:22:06-03:00] Running OpenShift Provider Certification Tool... 
INFO[2023-02-02T18:22:07-03:00] object already exists                         name=openshift-provider-certification namespace= resource=namespaces
INFO[2023-02-02T18:22:07-03:00] create request issued                         name=sonobuoy-config-cm namespace=openshift-provider-certification resource=configmaps
INFO[2023-02-02T18:22:07-03:00] create request issued                         name=sonobuoy-plugins-cm namespace=openshift-provider-certification resource=configmaps
INFO[2023-02-02T18:22:07-03:00] create request issued                         name=sonobuoy namespace=openshift-provider-certification resource=pods
INFO[2023-02-02T18:22:08-03:00] create request issued                         name=sonobuoy-aggregator namespace=openshift-provider-certification resource=services
INFO[2023-02-02T18:22:08-03:00] Jobs scheduled! Waiting for resources be created... 
Thu, 02 Feb 2023 18:22:21 -03> Global Status: running
JOB_NAME                           | STATUS     | RESULTS    | PROGRESS                  | MESSAGE                                           
05-openshift-cluster-upgrade       | running    |            |                           |                                                   
10-openshift-kube-conformance      | running    |            |                           |                                                   
20-openshift-conformance-validated | running    |            |                           |                                                   
99-openshift-artifacts-collector   | running    |            |                           |                                                   
Thu, 02 Feb 2023 18:22:32 -03> Global Status: running
JOB_NAME                           | STATUS     | RESULTS    | PROGRESS                  | MESSAGE                                           
05-openshift-cluster-upgrade       | running    |            |                           |                                                   
10-openshift-kube-conformance      | running    |            |                           |                                                   
20-openshift-conformance-validated | running    |            |                           |                                                   
99-openshift-artifacts-collector   | running    |            |                           |                                       
(...)

Pods:

$ oc get pods -n openshift-provider-certification -o wide
NAME                                                               READY   STATUS             RESTARTS   AGE   IP           NODE                         NOMINATED NODE   READINESS GATES
sonobuoy                                                           1/1     Running            0          42s   10.131.2.5   ip-10-0-57-61.ec2.internal   <none>           <none>
sonobuoy-05-openshift-cluster-upgrade-job-60d3274e82e04d28         1/3     ImagePullBackOff   0          39s   10.131.2.8   ip-10-0-57-61.ec2.internal   <none>           <none>
sonobuoy-10-openshift-kube-conformance-job-b13d7ca45caf485b        1/3     ImagePullBackOff   0          39s   10.131.2.6   ip-10-0-57-61.ec2.internal   <none>           <none>
sonobuoy-20-openshift-conformance-validated-job-85c75be8d1c94818   1/3     ImagePullBackOff   0          39s   10.131.2.9   ip-10-0-57-61.ec2.internal   <none>           <none>
sonobuoy-99-openshift-artifacts-collector-job-fad442ab3e124582     1/3     ImagePullBackOff   0          39s   10.131.2.7   ip-10-0-57-61.ec2.internal   <none>           <none>

Pod:

$ oc describe pod -n openshift-provider-certification  sonobuoy-05-openshift-cluster-upgrade-job-60d3274e82e04d28

Events:
  Type     Reason          Age                From               Message
  ----     ------          ----               ----               -------
  Normal   Scheduled       60s                default-scheduler  Successfully assigned openshift-provider-certification/sonobuoy-05-openshift-cluster-upgrade-job-60d3274e82e04d28 to ip-10-0-57-61.ec2.internal
  Normal   AddedInterface  60s                multus             Add eth0 [10.131.2.8/23] from ovn-kubernetes
  Normal   Created         59s                kubelet            Created container sonobuoy-worker
  Normal   Started         59s                kubelet            Started container sonobuoy-worker
  Normal   Pulled          59s                kubelet            Container image "quay.io/ocp-cert/sonobuoy:v0.56.10" already present on machine
  Warning  Failed          43s (x2 over 59s)  kubelet            Error: ErrImagePull
  Normal   Pulling         43s (x2 over 59s)  kubelet            Pulling image "quay.io/ocp-cert/openshift-tests-provider-cert:dev20230127205105"
  Warning  Failed          43s (x2 over 59s)  kubelet            Failed to pull image "quay.io/ocp-cert/openshift-tests-provider-cert:dev20230127205105": rpc error: code = Unknown desc = reading manifest dev20230127205105 in quay.io/ocp-cert/openshift-tests-provider-cert: manifest unknown: manifest unknown
  Normal   BackOff         32s (x5 over 59s)  kubelet            Back-off pulling image "quay.io/ocp-cert/openshift-tests-provider-cert:dev20230127205105"
  Warning  Failed          32s (x5 over 59s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff         32s (x3 over 59s)  kubelet            Back-off pulling image "quay.io/ocp-cert/openshift-tests-provider-cert:dev20230127205105"
  Warning  Failed          32s (x3 over 59s)  kubelet            Error: ImagePullBackOff

One idea could be displaying the pod status while there's no message(empty field) on the plugin payload.

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty