Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-9419

[test-operator] test pod finished tests stuck in infinite sleep until interrupted

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • rhos-18.0.1
    • None
    • test-operator
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • test-operator-container-1.0.0-46
    • ?
    • ?
    • None
    • Important

      In a few neutron crc job cases I noticed test pod gets stuck, until the timeout of about 4 hours interrupts the zuul job, example job [1].

      I reproduced on same crc job autohold using this tempest CR [2], original pod got stuck.

      When using debug pod I ran the same test manually inside pod (run_tempest.sh), noticed test pod deliberately puts itself in infinite sleep until keyboard interrupt signal as seen in output [3] due to TEMPEST_DEBUG_MODE [4] set as true in shell env, maybe it is also set as true with non debug pod by mistake? (default is false [5], also false in shell script as seen in [4])

      It slows down work with CI and test runs by a lot.

      lpiwowar kopecmartin Can you please check it out?

      Thank you

      [1] 

      https://sf.hosted.upshift.rdu2.redhat.com/logs/38/38/a71a78c7b38c3cd7f04177dbce6588e4fe365f66/check-gitlab-cee/component-network-edpm-rhel9-rhoso18.0-crc-mblue/6c41f3f/

      [2]

      FYI ssh key created before running test pod

      ---
      apiVersion: test.openstack.org/v1beta1
      kind: Tempest
      metadata:
        name: tempest-tests-mdrl-1
        namespace: openstack
      spec:
        containerImage: quay.io/podified-antelope-centos9/openstack-tempest-all:current-podified
        SSHKeySecretName: my-ssh-key
        debug: true
        networkAttachments:
        - ctlplane
        tempestRun:
          includeList: |
            whitebox_neutron.*test_metadata_rate_limiting
          concurrency: 1
        tempestconfRun:
          overrides: |
            compute-feature-enabled.vnc_console true
            compute-feature-enabled.cold_migration true
            compute-feature-enabled.block_migration_for_live_migration true
            network-feature-enabled.port_security true
            neutron_plugin_options.advanced_image_ssh_user cloud-user
            neutron_plugin_options.available_type_drivers geneve
            neutron_plugin_options.create_shared_resources true
            neutron_plugin_options.is_igmp_snooping_enabled true
            neutron_plugin_options.ipv6_metadata false
            neutron_plugin_options.advanced_image_ref 11111111-1111-1111-1111-111111111111
            neutron_plugin_options.advanced_image_flavor_ref 22222222-2222-2222-2222-222222222222
            whitebox_neutron_plugin_options.openstack_type podified
            whitebox_neutron_plugin_options.run_traffic_flow_tests True
            whitebox_neutron_plugin_options.kubeconfig_path '/home/zuul/.crc/machines/crc/kubeconfig'
            whitebox_neutron_plugin_options.proxy_host_address 10.0.199.81
            validation.allowed_network_downtime 15
            validation.run_validation true
            identity.v3_endpoint_type public
            identity.v2_admin_endpoint_type public
      

      [3]

      ==============================                                                                                                                                                                
      Failed 1 tests - output below:                                                                                                                                                                
      ==============================                                                                                                                                                                
                                                                                                                                                                                                    
      setUpClass (whitebox_neutron_tempest_plugin.tests.scenario.test_metadata_rate_limiting.TestMetadataRateLimiting)                                                                              
      ----------------------------------------------------------------------------------------------------------------                                                                              
                                                                                                                                                                                                    
      Captured traceback:                                                                                                                                                                           
      ~~~~~~~~~~~~~~~~~~~                                                                                                                                                                           
          Traceback (most recent call last):
      
      ...
      
      ======                                                                                                                                                                               [13/4894]
      Totals                                                                                                                                                                                        
      ======                                                                                                                                                                                        
      Ran: 1 tests in 0.0000 sec.                                                                                                                                                                   
       - Passed: 0                                                                                                                                                                                  
       - Skipped: 0                                                                                                                                                                                 
       - Expected Fail: 0
       - Unexpected Success: 0
       - Failed: 1
      Sum of execute time for each test: 0.0000 sec.==============
      Worker Balance
      ==============
       - Worker 0 (1 tests) => 0:00:00No tests were successful during the run
      + RETURN_VALUE=1
      + popd
      ~ ~
      + popd
      ~
      + generate_test_results
      + pushd /var/lib/tempest/openshift
      ~/openshift ~
      + echo 'Excluded tests'
      Excluded tests
      + '[' '!' -z ']'
      + echo 'Included tests'
      Included tests
      + '[' '!' -z /etc/test_operator/include.txt ']' 
      + cat /etc/test_operator/include.txt
      whitebox_neutron.*test_metadata_rate_limiting
      + TEMPEST_LOGS_DIR=/var/lib/tempest/external_files/tempest-tests-mdrl-1/
      + mkdir -p /var/lib/tempest/external_files/tempest-tests-mdrl-1/
      + echo 'Generate subunit'
      Generate subunit
      + stestr last --subunit
      + echo 'Generate subunit xml file'
      Generate subunit xml file
      + subunit2junitxml /var/lib/tempest/external_files/tempest-tests-mdrl-1/testrepository.subunit
      + true
      + echo 'Generate html result'
      Generate html result
      + subunit2html /var/lib/tempest/external_files/tempest-tests-mdrl-1/testrepository.subunit /var/lib/tempest/external_files/tempest-tests-mdrl-1/stestr_results.html
      setUpClass (whitebox_neutron_tempest_plugin.tests.scenario.test_metadata_rate_limiting.TestMetadataRateLimiting)
      + echo Copying logs file
      Copying logs file
      + cp -rf /var/lib/tempest/openshift/etc /var/lib/tempest/openshift/logs /var/lib/tempest/openshift/tempest_lock /var/lib/tempest/openshift/tempest.log /var/lib/tempest/external_files/tempest
      -tests-mdrl-1/
      + popd
      ~
      + '[' true == true ']'
      + sleep infinity
      ^C

      [4]

      /var/lib/tempest/run_tempest.sh-96-TEMPEST_ARGS=""
      /var/lib/tempest/run_tempest.sh:97:TEMPEST_DEBUG_MODE="${TEMPEST_DEBUG_MODE:-false}"
      /var/lib/tempest/run_tempest.sh-98-
      --
      /var/lib/tempest/run_tempest.sh-104-# Catch errors when in debug mode
      /var/lib/tempest/run_tempest.sh:105:if [ ${TEMPEST_DEBUG_MODE} == true ]; then
      /var/lib/tempest/run_tempest.sh-106-    trap catch_error_if_debug ERR
      --
      /var/lib/tempest/run_tempest.sh-402-# Keep pod in running state when in debug mode
      /var/lib/tempest/run_tempest.sh:403:if [ ${TEMPEST_DEBUG_MODE} == true ]; then
      /var/lib/tempest/run_tempest.sh-404-    sleep infinity

      [5]

      https://github.com/openstack-k8s-operators/test-operator/blob/main/api/v1beta1/tempest_types.go#L241

      https://github.com/openstack-k8s-operators/test-operator/blob/c8323d45438523d3df645f18f2d47dbdc5fb6fa2/controllers/tempest_controller.go#L640 

            kopecmartin Martin Kopec
            rhn-support-mblue Maor Blaustein
            rhos-tempest
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: