Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-4675

[python-wheel-build/fromager] race condition in bootstrapper

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • None
    • Development Platform
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      [3351641950] Upstream Reporter: Doug Hellmann
      Upstream description:

      We have seen some failures in fromager's bootstrap command due to a race condition with re-resolving top-level dependencies.

      The error is

      ```
      2025-08-21 15:13:01,699 DEBUG:fromager._main_:258: llama_stack_provider_lmeval: could not handle toplevel dependency llama_stack_provider_lmeval (0.2.2)
      Traceback (most recent call last):
      File "/opt/app-root/lib64/python3.12/site-packages/fromager/bootstrapper.py", line 488, in _handle_build_requirements
      self.bootstrap(req=dep, req_type=build_type)
      File "/opt/app-root/lib64/python3.12/site-packages/fromager/bootstrapper.py", line 135, in bootstrap
      self._add_to_graph(req, req_type, resolved_version, source_url)
      File "/opt/app-root/lib64/python3.12/site-packages/fromager/bootstrapper.py", line 934, in _add_to_graph
      self.ctx.dependency_graph.add_dependency(
      File "/opt/app-root/lib64/python3.12/site-packages/fromager/dependency_graph.py", line 235, in add_dependency
      raise ValueError(
      ValueError: Trying to add setuptools==80.8.0 to parent llama-stack-provider-lmeval==0.2.2 but llama-stack-provider-lmeval==0.2.2 does not exist

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
      File "/opt/app-root/lib64/python3.12/site-packages/fromager/_main_.py", line 256, in invoke_main
      main(auto_envvar_prefix="FROMAGER")
      File "/opt/app-root/lib64/python3.12/site-packages/click/core.py", line 1442, in _call_
      return self.main(*args, **kwargs)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.12/site-packages/click/core.py", line 1363, in main
      rv = self.invoke(ctx)
      ^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.12/site-packages/click/core.py", line 1830, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.12/site-packages/click/core.py", line 1226, in invoke
      return ctx.invoke(self.callback, **ctx.params)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.12/site-packages/click/core.py", line 794, in invoke
      return callback(*args, **kwargs)
      ^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.12/site-packages/click/decorators.py", line 46, in new_func
      return f(get_current_context().obj, *args, **kwargs)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.12/site-packages/click/decorators.py", line 34, in new_func
      return f(get_current_context(), *args, **kwargs)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.12/site-packages/fromager/commands/bootstrap.py", line 482, in bootstrap_parallel
      ctx.invoke(
      File "/opt/app-root/lib64/python3.12/site-packages/click/core.py", line 794, in invoke
      return callback(*args, **kwargs)
      ^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.12/site-packages/click/decorators.py", line 46, in new_func
      return f(get_current_context().obj, *args, **kwargs)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.12/site-packages/fromager/commands/bootstrap.py", line 184, in bootstrap
      bt.bootstrap(req, requirements_file.RequirementType.TOP_LEVEL)
      File "/opt/app-root/lib64/python3.12/site-packages/fromager/bootstrapper.py", line 256, in bootstrap
      self._prepare_build_dependencies(req, sdist_root_dir, build_env)
      File "/opt/app-root/lib64/python3.12/site-packages/fromager/bootstrapper.py", line 433, in _prepare_build_dependencies
      self._handle_build_requirements(
      File "/opt/app-root/lib64/python3.12/site-packages/fromager/bootstrapper.py", line 490, in _handle_build_requirements
      raise ValueError(f"could not handle

      Unknown macro: {self._explain}

      ") from err
      ValueError: could not handle toplevel dependency llama_stack_provider_lmeval (0.2.2)
      ```

      I think the problem here is a race condition with resolving `llama_stack_provider_lmeval`.

      Looking at the logs, I see it first pick up version 0.2.1 and then later 0.2.2. I made a change recently that causes the bootstrapper to always use the same version for a given requirement specifier, so that should be eliminated.

      It's troubling that the 0.2.2 version wasn't automatically added to the graph, though. I think in this case that's because `llama_stack_provider_lmeval` is a top-level dependency, and those are added to the graph outside of the bootstrapper when the bootstrap command starts up and resolves them all to start. Then later when the rule is resolved again it gets a different answer and that version is not already in the graph.

      I see in bootstrapper.py in `_add_to_graph` that the function returns if the dependency type is top-level, because it's assuming that those packages are already in the graph. I think that's a mistake, it needs a more careful check that includes the version number.

              Unassigned Unassigned
              upstream-sync Upstream Sync
              Antonio's Team
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: