-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
None
Description of problem:
1. Creating member on Edge environment with an invalid subnet will (obviously) create an invalid member and won't allow Octavia to delete it afterwards.
2. It also makes the created member and any other member which is created afterwards to have "operating_status": "OFFLINE" & "provisioning_status": "ERROR", even if the other members were provided with the correct subnet.
Probably the HandleNetworkDeltas task fails, maybe Octavia tries to create a port as part of the task and it fails.
I got the following traceback:
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/octavia/controller/queue/v1/endpoints.py", line 127, in delete_member
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server self.worker.delete_member(member_id)
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/octavia/controller/worker/v1/controller_worker.py", line 508, in delete_member
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server delete_member_tf.run()
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/taskflow/engines/action_engine/engine.py", line 247, in run
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server for _state in self.run_iter(timeout=timeout):
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/taskflow/engines/action_engine/engine.py", line 340, in run_iter
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server failure.Failure.reraise_if_any(er_failures)
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/taskflow/types/failure.py", line 339, in reraise_if_any
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server failures[0].reraise()
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/taskflow/types/failure.py", line 346, in reraise
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server six.reraise(*self._exc_info)
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/six.py", line 709, in reraise
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server raise value
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server result = task.execute(**arguments)
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/octavia/controller/worker/v1/tasks/network_tasks.py", line 420, in execute
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server ret = handle_delta.execute(amphorae[amp_id], delta)
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/octavia/controller/worker/v1/tasks/network_tasks.py", line 337, in execute
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server port = self.network_driver.plug_fixed_ip(
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/octavia/network/drivers/neutron/base.py", line 294, in plug_fixed_ip
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server raise base.NetworkException(str(e))
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server octavia.network.base.NetworkException: Invalid input for operation: Failed to create port on network d7f7de6c-0e84-49e2-9042-697fa85d2532, because fixed_ips included invalid subnet 086e650b-0c78-43db-811c-5dfcd64423b6.
2023-04-17 14:07:21.866 13 ERROR oslo_messaging.rpc.server Neutron server returns request_ids: ['req-26d5bb1c-be0f-41fb-83be-4fa6e6394b60']
I will add the exact steps below.
Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20230404.n.1
How reproducible:
100%
Steps to Reproduce:
1. Deploy an Edge environment with both nova & neutron availability zones
2. Deploy the Octavia service
3. Create a loadbalancer with either 1/2 members, the first one with an invalid subnet
Actual results:
- None of the members are deletable, although deleting the LB with --cascade does delete them successfully
- Members which were provided with the correct subnet, sharing the same provisioning_status and operating_status as the first invalid member.
Expected results:
- Both members should be deletable
- Members which were provided with the correct subnet are supposed to be deployed successfully
Additional info: the commands I ran
I used this d/s job to deploy the environment
https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/view/QE/view/OSP17.1/job/DFG-edge-deployment-17.1-rhel-virthost-ipv4-3cont-2comp-2leafs-x-2comp-tls_everywhere-routed_provider_nets-ovn-naz/
To deploy the Octavia service I created a file (octavia-dcn-parameters.yaml) with the following content:
octavia_controller_availability_zone: az-central
octavia_availability_zones:
az-central: # no cidr needed, it uses the already existing subnet
az-dcn1:
lb_mgmt_subnet_cidr: 172.47.0.0/16
az-dcn2:
lb_mgmt_subnet_cidr: 172.48.0.0/16
octavia_backbone_tenant_cidr: 172.49.0.0/16
And ran the following playbook:
ansible-playbook -i overcloud-deploy/central/config-download/central/tripleo-ansible-inventory.yaml /usr/share/ansible/tripleo-playbooks/octavia-dcn-deployment.yaml -e @octavia-dcn-parameters.yaml -e stack=central -v
- I created a security group with the following rules:
openstack security group rule create --protocol tcp --dst-port 22 sg1
openstack security group rule create --protocol tcp --dst-port 80 sg1
openstack security group rule create --protocol tcp --dst-port 8080 sg1
openstack security group rule create --protocol tcp --dst-port 443 sg1
openstack security group rule create --protocol icmp sg1
- I created 2 nova vms on the public network, each one on a different AZ with the default security group and I added the new security group that I created to the servers
openstack server create --wait --flavor m1.tiny --image cirros-0.5.2-x86_64 --network public --availability-zone az-dcn1 vm1
openstack server add security group vm1 sg1
openstack server create --wait --flavor m1.tiny --image cirros-0.5.2-x86_64 --network public --availability-zone az-dcn2 vm2
openstack server add security group vm2 sg1
- I created 3 availability zone profiles & availability zones
openstack loadbalancer availabilityzoneprofile create --provider amphora --name azp-dcn1 --availability-zone-data ' {"compute_zone": "az-dcn1", "management_network": "<mgmt-network-id-of-lb-mgmt-az-dcn1-net>"}'
{"compute_zone": "az-dcn2", "management_network": "<mgmt-network-id-of-lb-mgmt-az-dcn2-net>"}
openstack loadbalancer availabilityzoneprofile create --provider amphora --name azp-dcn2 --availability-zone-data ''
{"compute_zone": "az-central", "management_network": "<mgmt-network-id-of-lb-mgmt-net>"}
openstack loadbalancer availabilityzoneprofile create --provider amphora --name azp-central --availability-zone-data ''
openstack loadbalancer availabilityzone create --availabilityzoneprofile azp-dcn1 --name az-dcn1
openstack loadbalancer availabilityzone create --availabilityzoneprofile azp-dcn2 --name az-dcn2
openstack loadbalancer availabilityzone create --availabilityzoneprofile azp-central --name az-central
- I created a loadbalancer using segment1 subnet - which is the subnet that was created on the central az and is part of the public network I used for the nova servers
- And HTTP listener & pool
openstack loadbalancer create --name lb1 --vip-subnet-id segment1 --availability-zone az-central --wait
openstack loadbalancer listener create --wait --protocol-port 80 --protocol HTTP --name listener1 lb1
openstack loadbalancer pool create --name pool1 --lb-algorithm ROUND_ROBIN --listener listener1 --protocol HTTP
- I deployed the members using segment2, which is the subnet that is deployed on az-dcn2 - not the same az that the LB using
openstack loadbalancer member create --name member1 --address 10.101.20.212 --protocol-port 8080 --subnet-id segment2 pool1
openstack loadbalancer member create --name member2 --address 10.101.30.229 --protocol-port 8080 --subnet-id segment2 pool1
- I created an HTTP healthmonitor
openstack loadbalancer healthmonitor create --delay 10 --max-retries 4 --timeout 5 --type HTTP --name http_hm1 pool1
- My playbook also ran the Octavia testing server on the vm servers but I don't think that it matters on this case