Uploaded image for project: 'OPCT - OpenShift Provider Compatibility Tool'
  1. OPCT - OpenShift Provider Compatibility Tool
  2. OPCT-18

[plugins][artifacts-collector] Must-gather collector is starting prematurely (patch v0.2)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • None
    • opct-v0.2.1
    • Plugins
    • Important

    Description

      The artifacts collector is starting prematurely[1] due to the long time running the OpenShift conformance (20-openshift-conformance-validated)[2], consequently, the must-gather is not getting the data from the entire certification execution.

      This bug is caused by the 'blocker engine' to monitor and allow the plugin to run, there is a timeout and the last plugin, artifacts-collector, is falling the timeout due a long time running the blocked plugin.

      There are two possible solution in the short[A] and long-term[B]:

      • A: increase the timeout as it's expected to OCP Validated take long time
      • B: Validate and migrate to the native sonobuoy priority feature, which was requested by us to the upstream and implemented on the newer releases, but the OPCT didn't migrate yet

       

      [1] 99-openshift-artifacts-collector plugin executor started at 20221222-223727

      #./wait-plugin.sh:9>  20221222-223657> [waiter] Waiting 30s for Plugin[20-openshift-conformance-validated]...[99/100]
      #./wait-plugin.sh:9>  20221222-223727> [waiter] Plugin[20-openshift-conformance-validated] with status[running]...
      #./wait-plugin.sh:9>  20221222-223727> [waiter] {"plugins":[{"plugin":"10-openshift-kube-conformance","node":"global","status":"complete","result-status":"","result-counts":null,"progress":{"name":"10-openshift-kube-conformance","node":"global","timestamp":"2022-12-22T21:46:18.317405816Z","msg":"status=report-progress-finished","total":359,"completed":0,"failures":["[sig-api-machinery] CustomResourceDefinition Watch [Privileged:ClusterAdmin] CustomResourceDefinition Watch watch on custom resource definition objects [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"]}},{"plugin":"20-openshift-conformance-validated","node":"global","status":"running","result-status":"","result-counts":null,"progress":{"name":"20-openshift-conformance-validated","node":"global","timestamp":"2022-12-22T22:13:00.550685989Z","msg":"status=running","total":3454,"completed":0,"failures":["[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]","[sig-api-machinery] CustomResourceDefinition Watch [Privileged:ClusterAdmin] CustomResourceDefinition Watch watch on custom resource definition objects [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]","[sig-node] Pods Extended Pod Container lifecycle evicted pods should be terminal [Suite:openshift/conformance/parallel] [Suite:k8s]","[sig-scheduling] SchedulerPredicates [Serial] validates pod overhead is considered along with resource limits of pods that are allowed to run verify pod overhead is accounted for [Suite:openshift/conformance/serial] [Suite:k8s]"]}},{"plugin":"99-openshift-artifacts-collector","node":"global","status":"running","result-status":"","result-counts":null,"progress":{"name":"99-openshift-artifacts-collector","node":"global","timestamp":"2022-12-22T22:04:22.576573544Z","msg":"status=waiting-for=20-openshift-conformance-validated=(0/-3454/0)=[99/100]","total":0,"completed":0}}],"status":"running","tar-info":{"name":"","created":"0001-01-01T00:00:00Z","sha256":"","size":0}}
      #./wait-plugin.sh:9>  20221222-223727> [waiter] Timeout waiting condition 'complete' for plugin[20-openshift-conformance-validated].
      #./global_fn.sh:12>  [signal handler] ERROR on line 87 ./runner.sh
      #./runner.sh:17>  20221222-223727> [runner] starting executor...
      #./executor.sh:14>  [executor] Starting...
      #./executor.sh:16>  [executor] Checking if credentials are present...
      #./executor.sh:23>  [executor] Executor started. Choosing execution type based on environment sets.
      /tmp/sonobuoy/results /plugin
      [must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:58c1f71f2004767acbabfdf6ab3fc5689a63c713de564c01197fbc3795610ef6
      When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
      ClusterID: b31ef2da-7382-4e1c-91eb-65501c24a54c
      ClusterVersion: Stable at "4.12.0-rc.4"
      ClusterOperators:
              All healthy and stable
      
      [must-gather      ] OUT namespace/openshift-must-gather-c4z68 created
      [must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-n9mf4 created
      (...)  

      [2] 20-openshift-conformance-validated plugin executor finished at 20221222-231248 

      (...)
      Suite run returned error: 19 fail, 1305 pass, 2114 skip (1h25m15s)
      error: 19 fail, 1305 pass, 2114 skip (1h25m15s)
      + os_log_info 'openshift-tests finished[0]'
      ++ caller
      ++ awk '{print$2":"$1}'
      + echo '#./executor.sh:62> ' #./executor.sh:62>  openshift-tests finished[0]
      'openshift-tests finished[0]'
      + set +x
      #./executor.sh:117>  Plugin executor finished. Result[0]
      #./runner.sh:17>  20221222-231243> [runner] Plugin finished. Result[0]
      #./runner.sh:17>  20221222-231243> [runner] Saving results triggered. Slowing down...
      /tmp/sonobuoy/results /plugin
      #./runner.sh:17>  20221222-231248> [runner] Looking for junit result files...
      #./runner.sh:17>  20221222-231248> [runner] Adjusting permissions for results files.
      #./runner.sh:17>  20221222-231248> [runner] Sending plugin done to unlock report-progress
      #./runner.sh:17>  20221222-231248> [runner] Sending sonobuoy worker the result file path
      /plugin
      #./runner.sh:17>  20221222-231248> [runner] Results saved at /tmp/sonobuoy/results/done=[/tmp/sonobuoy/results/junit_e2e__20221222-214632.xml] 

      [3] report-progress is not finishing correctly after the timeout. (maybe addressed to another issue, need to check the strategy of keeping the blocker engine or not)

      $ oc get pods -n openshift-provider-certification
      NAME                                                               READY   STATUS      RESTARTS   AGE
      sonobuoy                                                           1/1     Running     0          81m
      sonobuoy-10-openshift-kube-conformance-job-47af576993104fc8        0/3     Completed   0          81m
      sonobuoy-20-openshift-conformance-validated-job-0b739af4dbf24fb1   3/3     Running     0          81m
      sonobuoy-99-openshift-artifacts-collector-job-22020c2d3faf485b     1/3     NotReady    0          81m
      Containers:
        report-progress:
          Container ID:  cri-o://fc7127ff9032da0477cc5c0a2e243ba6a5e438c93955e079c5d7762c38ca6bc9
          Image:         quay.io/ocp-cert/openshift-tests-provider-cert:dev20221221190826
          Image ID:      quay.io/ocp-cert/openshift-tests-provider-cert@sha256:67133bcbd49285ebcf72bd2b16746abe663599b13a36a90fdb3b5bb69e0fc791
          Port:          <none>
          Host Port:     <none>
          Command:
            ./report-progress.sh
          State:          Running
            Started:      Fri, 23 Dec 2022 11:34:59 -0300
          Ready:          True
          Restart Count:  0
          Environment:
            ENV_NODE_NAME:            (v1:spec.nodeName)
            ENV_POD_NAME:            sonobuoy-99-openshift-artifacts-collector-job-22020c2d3faf485b (v1:metadata.name)
            ENV_POD_NAMESPACE:       openshift-provider-certification (v1:metadata.namespace)
            PLUGIN_ID:               99
            RESULTS_DIR:             /tmp/sonobuoy/results
            SONOBUOY:                true
            SONOBUOY_CONFIG_DIR:     /tmp/sonobuoy/config
            SONOBUOY_PROGRESS_PORT:  8099
            SONOBUOY_RESULTS_DIR:    /tmp/sonobuoy/results
          Mounts:
            /tmp/shared from shared (rw)
            /tmp/sonobuoy/results from results (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8hfgl (ro)
      $ oc logs sonobuoy-99-openshift-artifacts-collector-job-22020c2d3faf485b -n openshift-provider-certification -c report-progress --tail=10
              "failures":[],
              "msg":"status=waiting-for=20-openshift-conformance-validated=(0/-3454/0)=[98/100]"
          }
      20221223-151912> [report] Sending report payload [dep-checker]: {
              "completed":0,
              "total":0,
              "failures":[],
              "msg":"status=waiting-for=20-openshift-conformance-validated=(0/-3454/0)=[99/100]"
          }
      Timeout waiting condition 'complete' for plugin[20-openshift-conformance-validated].
        

      Attachments

        Issue Links

          Activity

            People

              rhn-support-rvanderp Richard Vanderpool
              rhn-support-mrbraga Marco Braga
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty