-
Bug
-
Resolution: Done-Errata
-
Normal
-
rhos-18.0.0
-
1
-
False
-
-
False
-
Committed
-
openstack-nova-27.1.1-18.0.20231114154650.a869ab1.el9ost
-
Committed
-
Committed
-
None
-
Release Note Not Required
-
-
-
Moderate
+++ This bug was initially created as a clone of Bug #2232232 +++
Description of problem: Intermittently nova fails to mount an NFS volume using a netapp backend with FIPS is enabled. This puddle has been tested multiple times and failure happened once today. Want to confirm if the netapp backend needs to be updated or if there is a potential race in deployment. Error produced:
2023-08-15 03:14:03.612 2 DEBUG nova.virt.libvirt.volume.mount [req-2cb867f8-80e6-419d-9986-26309b832a92 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] _HostMountState.mount(fstype=nfs, export=192.168.1.50:/vol_cinder, vol_name=volume-dbd2fc42-50f6-46f4-bed8-75034fcb0a33, /var/lib/nova/mnt/724dab229d80c6a1a1e49a71c8356eed, options=[]) generation 0 mount /usr/lib/python3.9/site-packages/nova/virt/libvirt/volume/mount.py:287
2023-08-15 03:14:03.613 2 DEBUG nova.virt.libvirt.volume.mount [req-2cb867f8-80e6-419d-9986-26309b832a92 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Mounting /var/lib/nova/mnt/724dab229d80c6a1a1e49a71c8356eed generation 0 mount /usr/lib/python3.9/site-packages/nova/virt/libvirt/volume/mount.py:299
2023-08-15 03:14:03.615 2 DEBUG oslo_concurrency.lockutils [req-21a9119f-5093-4314-9041-6b4eac9791ce 7e39a6fa25e24604a9b0b0a454bbbf91 f9aa00d658424550990f75f23f187541 - default default] Releasing lock "refresh_cache-70e4550d-c03c-4f90-bdb2-fb6e082cfc90" lock /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:282
2023-08-15 03:14:03.621 18 DEBUG oslo_concurrency.lockutils [req-bcb0888d-60f7-43bb-a03e-feb4303a74a2 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Lock "4f13d7ce-d27d-4d6b-8c7e-99343707fbe9" acquired by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:355
2023-08-15 03:14:03.622 18 DEBUG oslo_concurrency.lockutils [req-bcb0888d-60f7-43bb-a03e-feb4303a74a2 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Lock "4f13d7ce-d27d-4d6b-8c7e-99343707fbe9" released by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: held 0.001s inner /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:367
2023-08-15 03:14:03.623 18 DEBUG oslo_concurrency.lockutils [req-bcb0888d-60f7-43bb-a03e-feb4303a74a2 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Lock "4f13d7ce-d27d-4d6b-8c7e-99343707fbe9" acquired by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:355
2023-08-15 03:14:03.623 18 DEBUG oslo_concurrency.lockutils [req-bcb0888d-60f7-43bb-a03e-feb4303a74a2 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Lock "4f13d7ce-d27d-4d6b-8c7e-99343707fbe9" released by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: held 0.000s inner /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:367
2023-08-15 03:14:03.688 15 DEBUG neutron.wsgi [-] (15) accepted ('172.17.1.29', 54120) server /usr/lib/python3.9/site-packages/eventlet/wsgi.py:992
2023-08-15 03:14:03.697 21 DEBUG neutron_lib.db.api [req-86bca735-e7dc-44bc-9947-1a1aea16f2ad - - - - -] Retry wrapper got retriable exception: UPDATE statement on table 'standardattributes' expected to update 1 row(s); 0 were matched. wrapped /usr/lib/python3.9/site-packages/neutron_lib/db/api.py:183
2023-08-15 03:14:03.697 21 DEBUG oslo_db.api [req-86bca735-e7dc-44bc-9947-1a1aea16f2ad - - - - -] Performing DB retry for function neutron.api.rpc.handlers.l3_rpc.L3RpcCallback.update_floatingip_statuses wrapper /usr/lib/python3.9/site-packages/oslo_db/api.py:155
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [req-2cb867f8-80e6-419d-9986-26309b832a92 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Setting instance vm_state to ERROR: TypeError: catching classes that do not inherit from BaseException is not allowed
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Traceback (most recent call last):
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/volume/mount.py", line 305, in mount
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] nova.privsep.fs.mount(fstype, export, mountpoint, options)
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/oslo_privsep/priv_context.py", line 253, in _wrap
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] return self.channel.remote_call(name, args, kwargs)
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/oslo_privsep/daemon.py", line 226, in remote_call
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] raise exc_type(*result[2])
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Command: mount -t nfs 192.168.1.50:/vol_cinder /var/lib/nova/mnt/724dab229d80c6a1a1e49a71c8356eed
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Exit code: 32
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Stdout: ''
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Stderr: 'Failed to connect to bus: No data available\nmount.nfs: access denied by server while mounting 192.168.1.50:/vol_cinder\n'
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90]
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] During handling of the above exception, another exception occurred:
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90]
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Traceback (most recent call last):
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 10256, in _error_out_instance_on_exception
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] yield
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 5851, in _finish_resize_helper
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] network_info = self._finish_resize(context, instance, migration,
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 5789, in _finish_resize
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] self._set_instance_info(instance, old_flavor)
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 227, in _exit_
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] self.force_reraise()
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] raise self.value
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 5772, in _finish_resize
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] self.driver.finish_migration(context, migration, instance,
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 11308, in finish_migration
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] xml = self._get_guest_xml(context, instance, network_info,
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7182, in _get_guest_xml
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] conf = self._get_guest_config(instance, network_info, image_meta,
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 6789, in _get_guest_config
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] storage_configs = self._get_guest_storage_config(context,
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 5395, in _get_guest_storage_config
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] self._connect_volume(context, connection_info, instance)
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 1883, in _connect_volume
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] vol_driver.connect_volume(connection_info, instance)
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/volume/fs.py", line 113, in connect_volume
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] mount.mount(self.fstype, export, vol_name, mountpoint, instance,
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/volume/mount.py", line 407, in mount
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] mount_state.mount(fstype, export, vol_name, mountpoint, instance,
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/volume/mount.py", line 306, in mount
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] except processutils.ProcessExecutionError():
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] TypeError: catching classes that do not inherit from BaseException is not allowed
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90]
2023-08-15 03:14:03.769 15 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-1bc380d3-ff31-4719-b699-b3bcffdba79a 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Attributes excluded by policy engine: ['standard_attr_id', 'binding:profile', 'binding:host_id', 'binding:vif_type', 'binding:vif_details', 'resource_request'] _exclude_attributes_by_policy /usr/lib/python3.9/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:259
2023-08-15 03:14:03.771 15 INFO neutron.wsgi [req-1bc380d3-ff31-4719-b699-b3bcffdba79a 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] 172.17.1.100,172.17.1.29 "GET /v2.0/ports?device_id=70e4550d-c03c-4f90-bdb2-fb6e082cfc90 HTTP/1.1" status: 200 len: 1148 time: 0.0815990
2023-08-15 03:14:03.826 15 INFO neutron.wsgi [req-92242980-60a4-43c8-aceb-c8383f1798c8 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] 172.17.1.100,172.17.1.29 "GET /v2.0/security-groups?id=706a223d-2fa5-4380-9a2e-82efbcba23a8&fields=id&fields=name HTTP/1.1" status: 200 len: 289 time: 0.0494747
2023-08-15 03:14:03.829 18 DEBUG nova.policy [req-bcb0888d-60f7-43bb-a03e-feb4303a74a2 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Policy check for os_compute_api:os-extended-server-attributes failed with credentials
authorize /usr/lib/python3.9/site-packages/nova/policy.py:199
Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20230802.n.1
How reproducible:
~10%
Steps to Reproduce:
1. Create a deployment with Netapp as the backend and deploy with FIPs
2. Create a volume backed guest and resize the guest to a new flavor
3.
Actual results:
When resizing the guest the volume fails to mount and resize fails
Expected results:
Resize completes as expected
Additional info:
Relevant ci/log info provided in follow up comment.
— Additional comment from James Parker on 2023-08-15 20:50:33 UTC —
Job: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-all-unified-17.1_d-rhel-vhost-3cont_2comp-ipv4-vxlan-netapp-fips/39/
Job Logs: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-all-unified-17.1_d-rhel-vhost-3cont_2comp-ipv4-vxlan-netapp-fips/39/
Test Report: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-all-unified-17.1_d-rhel-vhost-3cont_2comp-ipv4-vxlan-netapp-fips/39//testReport/tempest.api.compute.servers.test_server_actions/ServerActionsTestJSON/test_resize_volume_backed_server_confirm_id_e6c28180_7454_4b59_b188_0257af08a63b_volume_/
Test Utilized: https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_server_actions.py
— Additional comment from James Parker on 2023-08-15 20:52:04 UTC —
Combined logs of resize request failing has been attached.
Request UUID: req-2cb867f8-80e6-419d-9986-26309b832a92
Server UUID: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90
— Additional comment from James Parker on 2023-08-15 20:54:03 UTC —
Adding Storage as well to get feedback on if this is an issue with netapp.
— Additional comment from Eric Harney on 2023-08-22 12:45:03 UTC —
There is an actual bug here (TypeError), but do we know why the share didn't mount?
2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Stderr: 'Failed to connect to bus: No data available\nmount.nfs: access denied by server while mounting 192.168.1.50:/vol_cinder\n'
— Additional comment from Artom Lifshitz on 2023-08-22 14:48:41 UTC —
(In reply to Eric Harney from comment #4)
> There is an actual bug here (TypeError), but do we know why the share didn't
> mount?
So that's a legit Nova bug in terms of error handling that should be easy enough to fix, but it won't address the root cause, which is below.
> 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance:
> 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Stderr: 'Failed to connect to bus: No
> data available\nmount.nfs: access denied by server while mounting
> 192.168.1.50:/vol_cinder\n'
I'm guessing we'd need to look at the server-side to figure this out? But... the server-side is Netapp, so...?
— Additional comment from Eric Harney on 2023-08-22 14:54:49 UTC —
(In reply to Artom Lifshitz from comment #5)
> (In reply to Eric Harney from comment #4)
> > 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance:
> > 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Stderr: 'Failed to connect to bus: No
> > data available\nmount.nfs: access denied by server while mounting
> > 192.168.1.50:/vol_cinder\n'
>
> I'm guessing we'd need to look at the server-side to figure this out? But...
> the server-side is Netapp, so...?
That, and it's also worth looking into whether FIPS mode changes anything in NFS that could affect how NFS shares are mounted etc.
— Additional comment from Alex Stupnikov on 2023-08-23 09:47:57 UTC —
I will go ahead and propose an upstream fix for nova except match shortly. It is indeed looks like a trivial fix to me.
I also can try to reproduce this problem manually and collect extra data for NetApp if needed. But I need access to worker node and VMs.
— Additional comment from Alex Stupnikov on 2023-08-23 11:08:43 UTC —
Upstream master review: https://review.opendev.org/c/openstack/nova/+/892496
— Additional comment from Alex Stupnikov on 2023-08-23 15:23:46 UTC —
In fact Melanie already reported this in upstream and proposed fix:
https://bugs.launchpad.net/nova/+bug/1984736
https://review.opendev.org/c/openstack/nova/+/852795
Upstream fix is currently blocked by CI (nova-lvm failure). With that being said, should we re-assign this BZ?
— Additional comment from RHEL Program Management on 2023-08-23 16:22:06 UTC —
This item has been properly Triaged and planned for the release, and Target Release is now set to match the release flag.
— Additional comment from James Smith on 2023-09-08 20:47:12 UTC —
If you think customers need a description of this bug in addition to the content of the BZ summary field, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.
If this bug does not require an additional Doc Text description, please set the 'requires_doc_text' flag to '-'.
If the BZ already has Doc Text, please perform a quick review. Update it if needed.
Ensure that the flags for the BZ include the a release request or release approval. For instance, for any 17.1 release, you need rhos-17.1? or rhos-17.1+.
Ensure that the Target Milestone is correct. This value indicates when the Bug Fix or Enhancement is scheduled for release.
Timing is important. If the BZ is a fix scheduled for a future release, but the bug affects the current release, ensure that Doc Type is set to Known Issue and that the Doc Text describes the problem, not the planned fix.
If a BZ is a fix that is included in the current release, ensure that Doc Type is set to Bug Fix and that the Doc Text describes the fix.
— Additional comment from Artom Lifshitz on 2023-09-08 20:56:42 UTC —
QE found the issue, no doctext needed.
— Additional comment from RHEL Program Management on 2023-10-16 15:50:47 UTC —
This bugzilla has had its Target Release removed since it does not have a Target Milestone set (i.e. it has not been committed for a specific release).
- external trackers
- links to
-
RHBA-2024:133297 Release of components for Red Hat OpenStack Services on OpenShift 18.0