-
Bug
-
Resolution: Done
-
Normal
-
None
-
rhos-17.1.4
-
False
-
-
False
-
?
-
None
-
-
-
PerfScale Sprint 81, PerfScale Sprint 82, PerfScale Sprint 83, PerfScale Sprint 84, PerfScale Sprint 85, PerfScale Sprint 86, PerfScale Sprint 87, PerfScale Sprint 88, PerfScale Sprint 89, PerfScale Sprint 90, PerfScale Sprint 91, PerfScale Sprint 92, PerfScale Sprint 93, PerfScale Sprint 94, PerfScale Sprint 95, PerfScale Sprint 96, PerfScale Sprint 97, PerfScale Sprint 98, PerfScale Sprint 99, PerfScale Sprint 100, PerfScale Sprint 101, PerfScale Sprint 102, PerfScale Sprint 103, PerfScale Sprint 104, PerfScale Sprint 105
-
25
Description of problem:
We are deploying OSP 17.1 with TLS-e 3 controllers, 220 computes and 5 cephstorage nodes, the config-download is failing
2023-01-10 06:44:01,442 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.442154 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-29
2023-01-10 06:44:01,443 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.443682 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-3
2023-01-10 06:44:01,445 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.445267 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-30
2023-01-10 06:44:01,447 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.446827 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-31
2023-01-10 06:44:01,448 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.448358 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-198
2023-01-10 06:44:01,450 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.449897 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-199
2023-01-10 06:44:01,451 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.451428 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-2
2023-01-10 06:44:01,457 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.457483 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-32
2023-01-10 06:44:01,463 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.463482 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-37
2023-01-10 06:44:01,469 p=963541 u=stack n=ansible | ERROR! Unexpected Exception, this is probably a bug: [Errno 24] Too many open files: '/tmp/tripleoyihd0kky/3afb0e15-9266-460c-8713-af61d49efdec/job_events/478fcbfb-859a-483e-b3dd-241c2fc4c112-partial.json.tmp'
2023-01-10 06:44:01,470 p=963541 u=stack n=ansible | to see the full traceback, use -vvv
2023-01-10 06:44:01,472 p=963541 u=stack n=ansible | the full traceback was:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/ansible/plugins/cache/_init_.py", line 169, in set
self._dump(value, tmpfile_path)
File "/usr/lib/python3.9/site-packages/ansible/plugins/cache/jsonfile.py", line 63, in _dump
with codecs.open(filepath, 'w', encoding='utf-8') as f:
File "/usr/lib64/python3.9/codecs.py", line 905, in open
file = builtins.open(filename, mode, buffering)
OSError: [Errno 24] Too many open files: '/home/stack/.tripleo/fact_cache/tmp4e42lkvk'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/share/ansible/plugins/strategy/tripleo_free.py", line 313, in run
result |= self.process_work()
File "/usr/share/ansible/plugins/strategy/tripleo_free.py", line 264, in process_work
results = self._process_pending_results(self._iterator)
File "/usr/lib/python3.9/site-packages/ansible/plugins/strategy/_init_.py", line 157, in inner
results = func(self, iterator, one_pass=one_pass, max_passes=max_passes, do_handlers=do_handlers)
File "/usr/lib/python3.9/site-packages/ansible/plugins/strategy/_init_.py", line 754, in _process_pending_results
self._variable_manager.set_host_facts(target_host, result_item['ansible_facts'].copy())
File "/usr/lib/python3.9/site-packages/ansible/vars/manager.py", line 677, in set_host_facts
self._fact_cache[host] = host_cache
File "/usr/lib/python3.9/site-packages/ansible/vars/fact_cache.py", line 36, in _setitem_
self._plugin.set(key, value)
File "/usr/lib/python3.9/site-packages/ansible/plugins/cache/_init_.py", line 171, in set
display.warning("error in '%s' cache plugin while trying to write to '%s' : %s" % (self.plugin_name, tmpfile_path, to_bytes(e)))
File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/display.py", line 41, in wrapper
return f(*args, **kwargs)
File "/usr/lib/python3.9/site-packages/ansible/utils/display.py", line 403, in warning
self.display(new_msg, color=C.COLOR_WARN, stderr=True)
File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/display.py", line 89, in wrapper
event_context.dump_begin(fileobj)
File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/events.py", line 196, in dump_begin
self.cache.set(":1:ev-{}".format(begin_dict['uuid']), begin_dict)
File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/events.py", line 72, in set
with os.fdopen(os.open(write_location, os.O_WRONLY | os.O_CREAT, stat.S_IRUSR | stat.S_IWUSR), 'w') as f:
OSError: [Errno 24] Too many open files: '/tmp/tripleoyihd0kky/3afb0e15-9266-460c-8713-af61d49efdec/job_events/b1d6856c-2212-4556-999f-5415a29501a3-partial.json.tmp'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/ansible/cli/_init_.py", line 601, in cli_executor
exit_code = cli.run()
File "/usr/lib/python3.9/site-packages/ansible/cli/playbook.py", line 143, in run
results = pbex.run()
File "/usr/lib/python3.9/site-packages/ansible/executor/playbook_executor.py", line 190, in run
result = self._tqm.run(play=play)
File "/usr/lib/python3.9/site-packages/ansible/executor/task_queue_manager.py", line 321, in run
play_return = strategy.run(iterator, play_context)
File "/usr/share/ansible/plugins/strategy/tripleo_free.py", line 324, in run
display.error("Exception while running task loop: "
File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/display.py", line 41, in wrapper
return f(*args, **kwargs)
File "/usr/lib/python3.9/site-packages/ansible/utils/display.py", line 458, in error
self.display(new_msg, color=C.COLOR_ERROR, stderr=True)
File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/display.py", line 89, in wrapper
event_context.dump_begin(fileobj)
File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/events.py", line 196, in dump_begin
self.cache.set(":1:ev-{}".format(begin_dict['uuid']), begin_dict)
File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/events.py", line 72, in set
with os.fdopen(os.open(write_location, os.O_WRONLY | os.O_CREAT, stat.S_IRUSR | stat.S_IWUSR), 'w') as f:
OSError: [Errno 24] Too many open files: '/tmp/tripleoyihd0kky/3afb0e15-9266-460c-8713-af61d49efdec/job_events/478fcbfb-859a-483e-b3dd-241c2fc4c112-partial.json.tmp'
Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20221130.n.1
How reproducible:
100 %
Steps to Reproduce:
1. Deploy overcloud with ~200
2. config-download fails with too many open files
Actual results:
Overcloud deployment failing
Expected results:
Overcloud deployed successfully
Additional info:
ANSIBLE_FORKS = 100
[stack@undercloud ~]$ ulimit -a
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1540021
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1540021
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited