Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-12654

BZ#2159663 Config download fails with [Errno 24] Too many open files at scale

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • None
    • PerfScale Sprint 81, PerfScale Sprint 82, PerfScale Sprint 83, PerfScale Sprint 84, PerfScale Sprint 85, PerfScale Sprint 86, PerfScale Sprint 87, PerfScale Sprint 88, PerfScale Sprint 89, PerfScale Sprint 90, PerfScale Sprint 91, PerfScale Sprint 92, PerfScale Sprint 93, PerfScale Sprint 94, PerfScale Sprint 95, PerfScale Sprint 96, PerfScale Sprint 97, PerfScale Sprint 98, PerfScale Sprint 99, PerfScale Sprint 100, PerfScale Sprint 101, PerfScale Sprint 102, PerfScale Sprint 103, PerfScale Sprint 104, PerfScale Sprint 105
    • 25

      Description of problem:

      We are deploying OSP 17.1 with TLS-e 3 controllers, 220 computes and 5 cephstorage nodes, the config-download is failing

      2023-01-10 06:44:01,442 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.442154 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-29
      2023-01-10 06:44:01,443 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.443682 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-3
      2023-01-10 06:44:01,445 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.445267 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-30
      2023-01-10 06:44:01,447 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.446827 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-31
      2023-01-10 06:44:01,448 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.448358 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-198
      2023-01-10 06:44:01,450 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.449897 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-199
      2023-01-10 06:44:01,451 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.451428 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-2
      2023-01-10 06:44:01,457 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.457483 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-32
      2023-01-10 06:44:01,463 p=963541 u=stack n=ansible | 2023-01-10 06:44:01.463482 | bc97e1c3-4240-c543-7458-000000075212 | OK | Ensure we get the ansible interfaces facts | computer640-37
      2023-01-10 06:44:01,469 p=963541 u=stack n=ansible | ERROR! Unexpected Exception, this is probably a bug: [Errno 24] Too many open files: '/tmp/tripleoyihd0kky/3afb0e15-9266-460c-8713-af61d49efdec/job_events/478fcbfb-859a-483e-b3dd-241c2fc4c112-partial.json.tmp'
      2023-01-10 06:44:01,470 p=963541 u=stack n=ansible | to see the full traceback, use -vvv
      2023-01-10 06:44:01,472 p=963541 u=stack n=ansible | the full traceback was:

      Traceback (most recent call last):
      File "/usr/lib/python3.9/site-packages/ansible/plugins/cache/_init_.py", line 169, in set
      self._dump(value, tmpfile_path)
      File "/usr/lib/python3.9/site-packages/ansible/plugins/cache/jsonfile.py", line 63, in _dump
      with codecs.open(filepath, 'w', encoding='utf-8') as f:
      File "/usr/lib64/python3.9/codecs.py", line 905, in open
      file = builtins.open(filename, mode, buffering)
      OSError: [Errno 24] Too many open files: '/home/stack/.tripleo/fact_cache/tmp4e42lkvk'

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
      File "/usr/share/ansible/plugins/strategy/tripleo_free.py", line 313, in run
      result |= self.process_work()
      File "/usr/share/ansible/plugins/strategy/tripleo_free.py", line 264, in process_work
      results = self._process_pending_results(self._iterator)
      File "/usr/lib/python3.9/site-packages/ansible/plugins/strategy/_init_.py", line 157, in inner
      results = func(self, iterator, one_pass=one_pass, max_passes=max_passes, do_handlers=do_handlers)
      File "/usr/lib/python3.9/site-packages/ansible/plugins/strategy/_init_.py", line 754, in _process_pending_results
      self._variable_manager.set_host_facts(target_host, result_item['ansible_facts'].copy())
      File "/usr/lib/python3.9/site-packages/ansible/vars/manager.py", line 677, in set_host_facts
      self._fact_cache[host] = host_cache
      File "/usr/lib/python3.9/site-packages/ansible/vars/fact_cache.py", line 36, in _setitem_
      self._plugin.set(key, value)
      File "/usr/lib/python3.9/site-packages/ansible/plugins/cache/_init_.py", line 171, in set
      display.warning("error in '%s' cache plugin while trying to write to '%s' : %s" % (self.plugin_name, tmpfile_path, to_bytes(e)))
      File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/display.py", line 41, in wrapper
      return f(*args, **kwargs)
      File "/usr/lib/python3.9/site-packages/ansible/utils/display.py", line 403, in warning
      self.display(new_msg, color=C.COLOR_WARN, stderr=True)
      File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/display.py", line 89, in wrapper
      event_context.dump_begin(fileobj)
      File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/events.py", line 196, in dump_begin
      self.cache.set(":1:ev-{}".format(begin_dict['uuid']), begin_dict)
      File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/events.py", line 72, in set
      with os.fdopen(os.open(write_location, os.O_WRONLY | os.O_CREAT, stat.S_IRUSR | stat.S_IWUSR), 'w') as f:
      OSError: [Errno 24] Too many open files: '/tmp/tripleoyihd0kky/3afb0e15-9266-460c-8713-af61d49efdec/job_events/b1d6856c-2212-4556-999f-5415a29501a3-partial.json.tmp'

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
      File "/usr/lib/python3.9/site-packages/ansible/cli/_init_.py", line 601, in cli_executor
      exit_code = cli.run()
      File "/usr/lib/python3.9/site-packages/ansible/cli/playbook.py", line 143, in run
      results = pbex.run()
      File "/usr/lib/python3.9/site-packages/ansible/executor/playbook_executor.py", line 190, in run
      result = self._tqm.run(play=play)
      File "/usr/lib/python3.9/site-packages/ansible/executor/task_queue_manager.py", line 321, in run
      play_return = strategy.run(iterator, play_context)
      File "/usr/share/ansible/plugins/strategy/tripleo_free.py", line 324, in run
      display.error("Exception while running task loop: "
      File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/display.py", line 41, in wrapper
      return f(*args, **kwargs)
      File "/usr/lib/python3.9/site-packages/ansible/utils/display.py", line 458, in error
      self.display(new_msg, color=C.COLOR_ERROR, stderr=True)
      File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/display.py", line 89, in wrapper
      event_context.dump_begin(fileobj)
      File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/events.py", line 196, in dump_begin
      self.cache.set(":1:ev-{}".format(begin_dict['uuid']), begin_dict)
      File "/usr/lib/python3.9/site-packages/ansible_runner/display_callback/events.py", line 72, in set
      with os.fdopen(os.open(write_location, os.O_WRONLY | os.O_CREAT, stat.S_IRUSR | stat.S_IWUSR), 'w') as f:
      OSError: [Errno 24] Too many open files: '/tmp/tripleoyihd0kky/3afb0e15-9266-460c-8713-af61d49efdec/job_events/478fcbfb-859a-483e-b3dd-241c2fc4c112-partial.json.tmp'

      Version-Release number of selected component (if applicable):
      RHOS-17.1-RHEL-9-20221130.n.1

      How reproducible:
      100 %

      Steps to Reproduce:
      1. Deploy overcloud with ~200
      2. config-download fails with too many open files

      Actual results:
      Overcloud deployment failing

      Expected results:
      Overcloud deployed successfully

      Additional info:
      ANSIBLE_FORKS = 100

      [stack@undercloud ~]$ ulimit -a
      real-time non-blocking time (microseconds, -R) unlimited
      core file size (blocks, -c) 0
      data seg size (kbytes, -d) unlimited
      scheduling priority (-e) 0
      file size (blocks, -f) unlimited
      pending signals (-i) 1540021
      max locked memory (kbytes, -l) 64
      max memory size (kbytes, -m) unlimited
      open files (-n) 1024
      pipe size (512 bytes, -p) 8
      POSIX message queues (bytes, -q) 819200
      real-time priority (-r) 0
      stack size (kbytes, -s) 8192
      cpu time (seconds, -t) unlimited
      max user processes (-u) 1540021
      virtual memory (kbytes, -v) unlimited
      file locks (-x) unlimited

              igallagh@redhat.com Irina Gallagher
              jira-bugzilla-migration RH Bugzilla Integration
              rhos-docs@redhat.com rhos-docs@redhat.com
              rhos-dfg-df
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: