-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
rhelai-1.3.1
-
None
-
False
-
-
False
-
-
-
Moderate
To Reproduce Steps to reproduce the behavior:
- On GCP with 8xH100 gpus run RHEAI 1.3.1 run lab-multphase training
- redirect logs to a file
- Check contents of the file
Many warnings are shown :
/opt/app-root/lib64/python3.11/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` in stead. with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined] /opt/app-root/lib64/python3.11/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` in stead.
also this one:
warnings.warn(
/opt/app-root/lib64/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
Mentioning that no side-effect is seen and short training completes successfully
Expected behavior
- No warnings to be shown
- If there are warnings, it should not fill up the whole screen.
Screenshots
- Attached Image
Device Info (please complete the following information):
- Hardware Specs: [e.g. Apple M2 Pro Chip, 16 GB Memory, etc.]
- OS Version: [e.g. Mac OS 14.4.1, Fedora Linux 40]
- InstructLab Version: [output of \\\{{{}ilab --version{}}}]
- Provide the output of these two commands:
- sudo bootc status --format json | jq .status.booted.image.image.image to print the name and tag of the bootc image, should look like registry.stage.redhat.io/rhelai1/bootc-intel-rhel9:1.3-1732894187
- ilab system info to print detailed information about InstructLab version, OS, and hardware - including GPU / AI accelerator hardware
Additional context
- <your text here>
- ...
- ...