Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-3373

Data mixing does not produce proper sys prompt for models

XMLWordPrintable

    • Critical
    • Approved

      To Reproduce Steps to reproduce the behavior:

      1. Run agentic SDG: example: ilab --config /var/mnt/instg1/instructlab/config.yaml data generate --taxonomy-path /var/mnt/instg1/instructlab/taxonomy-doclingpoc/ --taxonomy-base empty --endpoint-url https://781d2e7c-us-east.lb.appdomain.cloud/v1 --model-family mixtral --sdg-scale-factor 30 --pipeline /var/mnt/instg1/instructlab/sdg-config/pipelines/agentic/ --model /instructlab/models/mixtral-8x7b-instruct-v0-1 --server-ctx-size 32768 --yaml-rules /var/mnt/instg1/instructlab/sdg-config/yamlrules.conf --output-dir /var/mnt/instg1/instructlab/doclingpocsdgout --tls-insecure
      1. Let data generate finish and inspect the knowledge train payload and the skills train payload and you will see all the system messages are empty string

      ```
      head -n 1 /var/mnt/instg1/instructlab/doclingpocsdgout/2025-02-11_055837/skills_train_msgs_2025-02-11T05_58_41.jsonl

      {"messages":[\{"content":"","role":"system"}

      ,

      {"content":"[DOCUMENT]\n\n Integration of TSW Events with SAP Commodity Management  \n\n\n\nThis feature enables you to set up contracts where Commodity Pricing Engine (CPE) is enabled and need a reference date to fetch the price quotations from the market. These quotations are the source of the price determined by CPE.\n\nTo ensure full integration with SAP Oil & Gas, an industry specific routine can now be used for CPE Reference Date determination which takes into consideration the TSW event dates.\n\nThere can be different TSW events maintained in nominations or tickets and each event can also have a date associated to it.\n\n\n\n Technical Details \n\n\n\n\n\n Type                     New                    \n Functional Localization  Not applicable         \n Scope Item               Not applicable         \n Application Component    IS-OIL-DS              \n Available As Of          SAP S\/4HANA 1909 FPS01 \n\n\n\n\n\n\n\n Line of Business  Solution Area  Capability  Title                                                     Short Description                                                                                                                                                                                                                                            Type  Scope Item  Application component  Version           Country-specific  \n\n Oil and Gas                                  Integration of TSW Events with SAP Commodity Management   This feature enables you to set up contracts where Commodity Pricing Engine (CPE) is enabled and need a reference date to fetch the price quotations from the market. These quotations are the source of the price determined by CPE.<br><br>[See More](https:\/\/help.sap.com\/docs\/SAP_S4HANA_ON-PREMISE\/4c6c3c99e6e94a92a626f424add61cba\/c858e260baf1495eb85b2f5e3c01bd7d.html)    New               IS-OIL-DS              S4H-OP 1909 001   n\/a               \n\n\n\n\n[END]\n[DOCUMENT]\n\n Improvements to the View \"PMQ Test Execution\" \n\n\n\nThis feature enables you to open several test objects or test results simultaneously in the view  PMQ Test Execution when performing mass tests. Furthermore, the view can be filtered individually according to the result status OK ![Icon result state ok](https:\/\/help.sap.com\/doc\/4c6c3c99e6e94a92a626f424add61cba\/100\/en-US\/loio9691ae47185f4ff38f98ba661f2f89d9_LowRes.png) , check errors ![Icon state check error](https:\/\/help.sap.com\/doc\/4c6c3c99e6e94a92a626f424add61cba\/100\/en-US\/loiodbf0459e94b04c2d82e933b291455d9e_LowRes.png) and errors ![Icon state error](https:\/\/help.sap.com\/doc\/4c6c3c99e6e94a92a626f424add61cba\/100\/en-US\/loiof7f9753ad44440e2a251cbbcf2e80f29_LowRes.png).\n\n\n\n Technical Details \n\n\n\n\n\n Product Feature is     New                     \n Country Dependency     Valid for all countries \n Scope Item             No scope item required  \n Application Component  FS-MPM (msg.PMQ)        \n Availability           SAP S\/4HANA 1809        \n\n\n\n\n\n\n\n Line of Business  Solution Area  Capability  Title                                          Short Description                                                                                                                                                                                            \n[END]\n[DOCUMENT]\nComplications of PVV include hemoptysis due to rupture, or bronchial compression resulting in middle lobe syndrome, esophageal compression inducing dysphagia, and cerebral infarction due to thromboembolic sequelae [4], [8], [19]. In most of the cases, the treatment of an asymptomatic PVV is unnecessary. However, monitoring is crucial. The increase in size increases the risk of complication and the necessity to undergo surgical intervention [7], [19]. Symptomatic patients with pulmonary venous hypertension due to mitral valve disease must undergo mitral valve replacement surgery, especially if the varix has progressively increased in size [6].\n[END]\n\nWhy does the document introduce an industry-specific routine for CPE Reference Date determination that takes into consideration the TSW event dates?\n\n1. To argue that the previous method of CPE Reference Date determination was inadequate.\n2. To provide a more accurate and efficient way of determining CPE Reference Dates by utilizing TSW event dates.\n3. To complicate the process of CPE Reference Date determination.\n4. To limit the integration of TSW Events with SAP Commodity Management to specific industries.","role":"user"}

      ,{"content":"2. To provide a more accurate and efficient way of determining CPE Reference Dates by utilizing TSW event dates. This new routine ensures that the reference date is determined in a more industry-specific and relevant context.","role":"assistant"}],"metadata":"{\"source\": \"instructlab_p1\"}","id":null

      (app-root) /$ head -n 1 /var/mnt/instg1/instructlab/doclingpocsdgout/2025-02-11_055837/knowledge_train_msgs_2025-02-11T05_58_41.jsonl

      {"messages":[\{"content":"","role":"system"}

      ,{"content":"<|user|>\nThe Phoenix constellation is located in the southern sky and was first depicted in Johann Bayer's 1603 Uranometria. The constellation's brighter stars and their Bayer designations were charted by the French explorer and astronomer Nicolas Louis de Lacaille in 1756. Phoenix stretches from approximately -39\u00b0 to -57\u00b0 declination and 23.5h to 2.5h of right ascension. It is one of the Southern Birds, along with Grus, Pavo, and Tucana.\n\nAlpha Phoenicis, the brightest star in Phoenix, is named Ankaa, which is derived from an Arabic word meaning 'the Phoenix'. Ankaa is an orange giant with an apparent magnitude of 2.4. Beta Phoenicis is a binary system composed of two yellow giants with a combined apparent magnitude of 3.3. Nu Phoenicis has a dust disk, and the constellation contains ten star systems with known planets. Additionally, two of the largest objects in the visible universe, the El Gordo and Phoenix Cluster galaxies, are located in Phoenix, at distances of 7.2 and 5.7 billion light years away, respectively.\n\nPhoenix is the radiant of two annual meteor showers: the Phoenicids in December and the July Phoenicids. Phoenix was first established as a constellation by Petrus Plancius in 1597 or 1598, based on the observations of Pieter Dirkszoon Keyser and Frederick de Houtman. It was the largest of the 12 constellations created by Plancius and was depicted on a 35-cm diameter celestial globe published in 1597 or 1598.\n\nWhat is the apparent magnitude of Ankaa?\n<|assistant|>\nThe apparent magnitude of Ankaa is 2.4.\n","role":"pretraining"}],"metadata":"{\"sdg_document\": \"The Phoenix constellation is located in the southern sky and was first depicted in Johann Bayer's 1603 Uranometria. The constellation's brighter stars and their Bayer designations were charted by the French explorer and astronomer Nicolas Louis de Lacaille in 1756. Phoenix stretches from approximately -39
      u00b0 to -57
      u00b0 declination and 23.5h to 2.5h of right ascension. It is one of the Southern Birds, along with Grus, Pavo, and Tucana.\\n
      nAlpha Phoenicis, the brightest star in Phoenix, is named Ankaa, which is derived from an Arabic word meaning 'the Phoenix'. Ankaa is an orange giant with an apparent magnitude of 2.4. Beta Phoenicis is a binary system composed of two yellow giants with a combined apparent magnitude of 3.3. Nu Phoenicis has a dust disk, and the constellation contains ten star systems with known planets. Additionally, two of the largest objects in the visible universe, the El Gordo and Phoenix Cluster galaxies, are located in Phoenix, at distances of 7.2 and 5.7 billion light years away, respectively.\\n
      nPhoenix is the radiant of two annual meteor showers: the Phoenicids in December and the July Phoenicids. Phoenix was first established as a constellation by Petrus Plancius in 1597 or 1598, based on the observations of Pieter Dirkszoon Keyser and Frederick de Houtman. It was the largest of the 12 constellations created by Plancius and was depicted on a 35-cm diameter celestial globe published in 1597 or 1598.\", \"domain\": \"astrology\", \"dataset\": \"document_knowledge_qa\", \"raw_document\": \"# Title: *Phoenix (constellation)\\n
      nPhoenix is a minor constellation in the southern sky. Named after the mythical phoenix, it was first depicted on a celestial atlas by Johann Bayer in his 1603 Uranometria . The French explorer and astronomer Nicolas Louis de Lacaille charted the brighter stars and gave their Bayer designations in 1756. The constellation stretches from roughly - 39
      u00b0 to - 57
      u00b0 declination, and from 23.5h to 2.5h of right ascension. The constellations Phoenix, Grus, Pavo and Tucana, are known as the Southern Birds.\\n
      nThe brightest star, Alpha Phoenicis, is named Ankaa, an Arabic word meaning 'the Phoenix'. It is an orange giant of apparent magnitude 2.4. Next is Beta Phoenicis, actually a binary system composed of two yellow giants with a combined apparent magnitude of 3.3. Nu Phoenicis has a dust disk, while the constellation has ten star systems with known planets and the recently discovered galaxy clusters El Gordo and the Phoenix Cluster-located 7.2 and 5.7 billion light years away respectively, two of the largest objects in the visible universe. Phoenix is the radiant of two annual meteor showers: the Phoenicids in December, and the July Phoenicids.\\n
      n## **History
      \\n
      nPhoenix was the largest of the 12 constellations established by Petrus Plancius from the observations of Pieter Dirkszoon Keyser and Frederick de Houtman. It first appeared on a 35-cm diameter celestial globe published in 1597 (or 1598) in\\n
      n## **Phoenix
      *\", \"dataset_type\": \"summary_detailed\"}","id":"5fb0cdcd-f244-49fb-8967-092264f40234"}

      1.  
      2.  
      3.  
        ```

      Expected behavior

      • Expect appropriate system prompt: for granite-3-1-8b that is "I am a Red Hat\xAE Instruct Model, an AI language model developed by\

          \ Red Hat and IBM Research based on the granite-3.1-8b-base model. My primary\

          \ role is to serve as a chat assistant."

      Screenshots

      • Attached Image

      Device Info (please complete the following information):

      • Hardware Specs:RHEL AI 1.4 8xA100 GPU
      • OS Version: RHEL AI 1.4
      • InstructLab Version:ilab, version 0.23.1
      • Provide the output of these two commands:
        • "registry.redhat.io/rhelai1/bootc-nvidia-rhel9:1.4"
      • [root@tyler-test-train-livemach root]# ilab system info

      ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no

      ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no

      ggml_cuda_init: found 8 CUDA devices:

        Device 0: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes

        Device 1: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes

        Device 2: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes

        Device 3: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes

        Device 4: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes

        Device 5: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes

        Device 6: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes

        Device 7: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes

      Platform:

        sys.version: 3.11.7 (main, Jan  8 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]

        sys.platform: linux

        os.name: posix

        platform.release: 5.14.0-427.50.1.el9_4.x86_64

        platform.machine: x86_64

        platform.node: tyler-test-train-livemach

        platform.python_version: 3.11.7

        os-release.ID: rhel

        os-release.VERSION_ID: 9.4

        os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)

        memory.total: 1259.87 GB

        memory.available: 1247.70 GB

        memory.used: 4.25 GB

       

      InstructLab:

        instructlab.version: 0.23.1

        instructlab-dolomite.version: 0.2.0

        instructlab-eval.version: 0.5.1

        instructlab-quantize.version: 0.1.0

        instructlab-schema.version: 0.4.2

        instructlab-sdg.version: 0.7.0

        instructlab-training.version: 0.7.0

       

      Torch:

        torch.version: 2.5.1

        torch.backends.cpu.capability: AVX512

        torch.version.cuda: 12.4

        torch.version.hip: None

        torch.cuda.available: True

        torch.backends.cuda.is_built: True

        torch.backends.mps.is_built: False

        torch.backends.mps.is_available: False

        torch.cuda.bf16: True

        torch.cuda.current.device: 0

        torch.cuda.0.name: NVIDIA A100-SXM4-80GB

        torch.cuda.0.free: 78.7 GB

        torch.cuda.0.total: 79.1 GB

        torch.cuda.0.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

        torch.cuda.1.name: NVIDIA A100-SXM4-80GB

        torch.cuda.1.free: 78.7 GB

        torch.cuda.1.total: 79.1 GB

        torch.cuda.1.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

        torch.cuda.2.name: NVIDIA A100-SXM4-80GB

        torch.cuda.2.free: 78.7 GB

        torch.cuda.2.total: 79.1 GB

        torch.cuda.2.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

        torch.cuda.3.name: NVIDIA A100-SXM4-80GB

        torch.cuda.3.free: 78.7 GB

        torch.cuda.3.total: 79.1 GB

        torch.cuda.3.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

        torch.cuda.4.name: NVIDIA A100-SXM4-80GB

        torch.cuda.4.free: 78.7 GB

        torch.cuda.4.total: 79.1 GB

        torch.cuda.4.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

        torch.cuda.5.name: NVIDIA A100-SXM4-80GB

        torch.cuda.5.free: 78.7 GB

        torch.cuda.5.total: 79.1 GB

        torch.cuda.5.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

        torch.cuda.6.name: NVIDIA A100-SXM4-80GB

        torch.cuda.6.free: 78.7 GB

        torch.cuda.6.total: 79.1 GB

        torch.cuda.6.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

        torch.cuda.7.name: NVIDIA A100-SXM4-80GB

        torch.cuda.7.free: 78.7 GB

        torch.cuda.7.total: 79.1 GB

        torch.cuda.7.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

       

      llama_cpp_python:

        llama_cpp_python.version: 0.3.2

        llama_cpp_python.supports_gpu_offload: True

      Bug impact

      • Invalid SDG content produced for skills/knowledge training

      Known workaround

      • Please add any known workarounds.

      Additional context

      • <your text here>

              osilkin@redhat.com Oleg Silkin
              lisowskiibm Tyler Lisowski (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: