Loading...

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: rhelai-1.5.2
Affects Version/s: rhelai-1.5
Component/s: InstructLab - SDG
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Release Blocker:
Approved
Target Version:

rhelai-1.5.2

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

To Reproduce Steps to reproduce the behavior:

ilab data generate with user data
User's geenrated data is attached.

Expected behavior

SDG output should not have null values in assistant content.
Null also found in sdg_document field in the output file

Screenshots{}

"messages": [
    {
      "content": "You are a Red Hat® Instruct Model, an AI language model developed by Red Hat and IBM Research based on the granite-3.1-8b-base model. Your primary role is to serve as a chat assistant.",
      "role": "system"
    },
    {
      "content": "9.16. Advanced virtual machine management\n9.16.6. Using huge pages with virtual machines\n9.16.6.1. Prerequisites\n- Nodes must have \n\nSummarize the document using extractive techniques.",
      "role": "user"
    },
    {
      "content": null,
      "role": "assistant"
    }
  ],

Device Info (please complete the following information):

InstructLab Version: RHEL 1.5 instructlab 0.26.1

Bug impact

Due to nulls in SDG output, training is crashing.

content = UNMASK_BEGIN_TOKEN + content + UNMASK_END_TOKEN
       ~~~~~~~~~~~~~~~~~~~^~~~~~~~~
TypeError: can only concatenate str (not "NoneType") to str
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
 File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/accelerated_train.py", line 233, in accelerated_train
  run_training(train_args=train_args, torch_args=torch_args)
 File "/opt/app-root/lib64/python3.11/site-packages/instructlab/training/__init__.py", line 36, in run_training
  return run_training(torch_args=torch_args, train_args=train_args)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/opt/app-root/lib64/python3.11/site-packages/instructlab/training/main_ds.py", line 699, in run_training
  dp.process_data(
 File "/opt/app-root/lib64/python3.11/site-packages/instructlab/training/data_process.py", line 1092, in process_data
  process_messages_into_input_ids(
 File "/opt/app-root/lib64/python3.11/site-packages/instructlab/training/data_process.py", line 814, in process_messages_into_input_ids
  data_with_input_ids_and_labels = process_samples(data, tokenizer, num_cpu_procs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/opt/app-root/lib64/python3.11/site-packages/instructlab/training/data_process.py", line 916, in process_samples
  processed_data = data.map(
           ^^^^^^^^^
 File "/opt/app-root/lib64/python3.11/site-packages/datasets/arrow_dataset.py", line 557, in wrapper
  out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/opt/app-root/lib64/python3.11/site-packages/datasets/arrow_dataset.py", line 3171, in map
  for rank, done, content in iflatmap_unordered(
 File "/opt/app-root/lib64/python3.11/site-packages/datasets/utils/py_utils.py", line 728, in iflatmap_unordered
  [async_result.get(timeout=0.05) for async_result in async_results]
 File "/opt/app-root/lib64/python3.11/site-packages/datasets/utils/py_utils.py", line 728, in <listcomp>
  [async_result.get(timeout=0.05) for async_result in async_results]
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/opt/app-root/lib64/python3.11/site-packages/multiprocess/pool.py", line 774, in get
  raise self._value
TypeError: can only concatenate str (not "NoneType") to str
Accelerated Training failed with 1

Known workaround

Please add any known workarounds.

Additional context

Shiv suggested a possibility of empty LLM responses in sdg output getting converted to null in messages format. Might have to drop if row is empty.
Need to file a bug.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

train_log_error_null.txt
2025/06/03 4:02 PM
136 kB
Aditi Saluja
taxonomy-virtmachine.tgz
2025/06/04 9:27 AM
4.77 MB
Vladimír Kadlec
Hide
sdg_null_content.zip
2025/06/03 4:03 PM
24.69 MB
Aditi Saluja
Extracting archive...
Show
sdg_null_content.zip
2025/06/03 4:03 PM
24.69 MB
Aditi Saluja
generation-1c9be78e-34b5-11f0-a9d2-0afffaeb628d.log
2025/06/03 4:02 PM
3.55 MB
Aditi Saluja

is related to

RHELAI-4356 Require instructlab-sdg==0.8.3 in RHEL AI Wheel Pipeline for RHEL AI >=1.5.2

Closed

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide