Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: InstructLab - CLI
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Currently: only the default max retry amount of 2 can be utilized for the openai chat client utilized to talk to the teacher model with SDG. Based on this default: if a teacher model backend is unavailable for more than a couple seconds: the sdg pipeline run will fail and manual intervention will be required to restart the process.

We can provide the ability for users to specify this in SDG. One valid use case this will enable is some users have a pool of redundant teacher models behind a load balancer that are health checked. It could take up to 30 seconds for some users to remove a bad teacher model from rotation depending on how the load balancer is configured to health check. By allowing users to control the open ai client retries: these users can control how long their pipelines will retry through attempts and timeouts (already configurable in pipeline) to ensure consistent successful runs of sdg.

More info in: https://github.com/instructlab/instructlab/issues/3516

Assignee:: Unassigned

Reporter:: Tyler Lisowski (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/08/12 12:14 AM

Updated:: 2025/08/15 1:25 PM

Resolved:: 2025/08/13 3:21 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates