-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
To Reproduce Steps to reproduce the behavior:
Deploy RHEL AI 1.4.x onto a server with enough resources to complete the SDG run, initializing ilab correctly
Error reproduced by Ben for the document shared by rhn-support-jharmiso :
File "/home/bbrownin/tmp/docling-index-out-of-range/venv/lib/python3.11/site-packages/docling/pipeline/simple_pipeline.py", line 41, in _build_document
conv_res.document = conv_res.input._backend.convert()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bbrownin/tmp/docling-index-out-of-range/venv/lib/python3.11/site-packages/docling/backend/md_backend.py", line 340, in convert
self.iterate_elements(parsed_ast, 0, doc, None)
File "/home/bbrownin/tmp/docling-index-out-of-range/venv/lib/python3.11/site-packages/docling/backend/md_backend.py", line 306, in iterate_elements
self.iterate_elements(child, depth + 1, doc, parent_element)
File "/home/bbrownin/tmp/docling-index-out-of-range/venv/lib/python3.11/site-packages/docling/backend/md_backend.py", line 166, in iterate_elements
f" - Heading level {element.level}, content:
"
~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
Error arising from older docling version.
Expected behavior
- SDG pipeline should run successfully
Device Info (please complete the following information):
-
- All/Any
Bug impact
- Any time user tries md files with unescaped special characters that imply special markdown handling, such as * to indicate an unordered list item or # to indicate a header block, with no following content sdg will fail.
Known workaround
- avoid unescaped special characters in md
- Ben proposed - Avoid empty markdown headings by themselves for the document shared by field teams containing those characters. https://github.com/DS4SD/docling/pull/843
- depends on
-
AIPCC-1145 <docling model changes in downstream image> docling model format updates
-
- Closed
-
- is related to
-
RHELAI-3844 enable instructlab-sdg library updates in renovate
-
- Code Review
-
- is triggering
-
AIPCC-889 Test latest Docling without deepsearch-glm [1.5]
-
- Closed
-
- relates to
-
AIPCC-953 Update Docling and dependencies in main branch(1.5) to resolve markdown conversion error
-
- Closed
-
-
AIPCC-955 Backport updated Docling and dependencies to 1.4 branch for 1.4.4 release
-
- Closed
-
- mentioned on