Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37630

Disruption monitor failing when running conformance against hypershift cluster

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      Description of problem:

      When running a conformance suite against a hypershift cluster (for example, CNI conformance) the MonitorTests step fails because of missing files from the disruption monitor.
          

      Version-Release number of selected component (if applicable):

      4.15.13
          

      How reproducible:

      Consistent
          

      Steps to Reproduce:

          1. Create a hypershift cluster
          2. Attempt to run an ose-tests suite. For example, the CNI conformance suite documented here: https://access.redhat.com/documentation/en-us/red_hat_software_certification/2024/html/red_hat_software_certification_workflow_guide/con_cni-certification_openshift-sw-cert-workflow-working-with-cloud-native-network-function#running-the-cni-tests_openshift-sw-cert-workflow-working-with-container-network-interface
          3. Note errors in logs
          

      Actual results:

      found errors fetching in-cluster data: [failed to list files in disruption event folder on node ip-10-0-130-177.us-west-2.compute.internal: the server could not find the requested resource failed to list files in disruption event folder on node ip-10-0-152-10.us-west-2.compute.internal: the server could not find the requested resource]
      Failed to write events from in-cluster monitors, err: open /tmp/artifacts/junit/AdditionalEvents__in_cluster_disruption.json: no such file or directory
          

      Expected results:

      No errors 
          

      Additional info:

      The first error can be avoided by creating the directory it's looking for on all nodes:
      for node in $(oc get nodes -oname); do oc debug -n default $node -- chroot /host mkdir -p /var/log/disruption-data/monitor-events; done
      However, I'm not sure if this directory not being created is due to the disruption monitor working properly on hypershift, or if this should be skipped on hypershift entirely.
      
      The second error is related to the ARTIFACT_DIR env var not being set locally, and can be avoided by creating a directory, setting that directory as the ARTIFACT_DIR, and then creating an empty "junit" dir inside of it.
      It looks like ARTIFACT_DIR defaults to a temporary directory if it's not set in the env, but the "junit" directory doesn't exist inside of it, so file creation in that non-existant directory fails.
          

              stbenjam Stephen Benjamin
              cblecker.openshift Christoph Blecker
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: