Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-9294

Deploying neutron-sriov agent with ServiceOverride is failing

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • ?
    • ?
    • None
    • Moderate

      Steps to reproduce:

      1. Deploy OpenStackControlPlane with default settings from `openstack-galera-network-isolation` examle CR,
      2. Deploy OpenStackDataPlaneNodeSet and Deployment (I used install_yaml's `make edpm_deploy` for that)
      3. Once all will be deployed fine, create new OpenStackDataPlaneDeployment with ServiceOverride defined to install additionally neutron-sriov-agent, like:

       

      apiVersion: dataplane.openstack.org/v1beta1
      kind: OpenStackDataPlaneDeployment
      metadata:
        name: openstack-edpm-add-sriov-agent
      spec:
        nodeSets:
          - openstack-edpm-ipam
        servicesOverride:
          - neutron-sriov 

      4.  Check status of the `neutron-sriov-openstack-edpm-add-sriov-agent-openstack-edpm` POD - it will be in Error state,

      5. Check logs of that pod, error will be something like:

      TASK [osp.edpm.edpm_container_manage : Create containers managed by Podman for /var/lib/edpm-config/container-startup-config/neutron_sriov_agent] ***
      Thursday 08 August 2024  08:18:50 +0000 (0:00:00.102)       0:00:26.445 ******* 
      [WARNING]: ERROR: Container neutron_sriov_agent exited with code 125 when runed 
      stderr: time="2024-08-08T08:18:51Z" level=info msg="podman filtering at log 
      level info" time="2024-08-08T08:18:51Z" level=info msg="Using sqlite as database backend" time="2024-08-08T08:18:51Z" level=info msg="Not using native     
      diff for overlay, this may cause degraded performance for building images:      
      kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled" time="2024-08-08T08:18:51Z"   
      level=info msg="Setting parallel job count to 7" time="2024-08-08T08:18:51Z" 
      level=info msg="Sysctl net.ipv4.ping_group_range=0 0 ignored in                  
      containers.conf, since Network Namespace set to host" Error: statfs 
      /var/lib/openstack/cacerts/neutron-sriov/tls-ca-bundle.pem: no such file or directory
      fatal: [edpm-compute-0]: FAILED! => {"changed": false, "msg": "Failed containers: neutron_sriov_agent"} 
      

      I was able to workaround it by ssh to the edpm node and do `sudo mkdir /var/lib/openstack/cacerts/neutron-sriov; sudo cp /var/lib/openstack/cacerts/neutron-metadata/tls-ca-bundle.pem /var/lib/openstack/cacerts/neutron-sriov/` and then ansible runner container finished job without any errors.

       

      I didn't check this in the deployment where neutron-sriov agent would be enabled since begining. Maybe the issue is only when it is run with ServiceOverride. This has to be checked also.

       

              Unassigned Unassigned
              skaplons@redhat.com Slawomir Kaplonski
              rhos-dfg-networking-squad-neutron
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: