Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-3257

[instructlab/instructlab] [RAG][Dev] Fix ID already exists bug in rag ingest

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Approved

      [2821030454] Upstream Reporter: Bill Murdock
      Upstream issue status: Closed
      Upstream description:

      Reported by Jorge Catalan:

      Step 1: Converted pdfs. This completed without issue.

      (venv) jocatala-mac:converteddocs jcats$ ilab rag convert --input-dir docstoconvert/ --output-dir converteddocs/

      Step 2: Started ingestion and aborted with ctrl+c

      (venv) jocatala-mac:converteddocs jcats$ ilab rag ingest --input-dir=../converteddocs/

      Step 3: Started ingestion again. Errors out

      ERROR 2025-01-28 20:07:37,760 instructlab.rag.haystack.document_store_ingestor:74: Ingestion attempt failed: ID 'a661c8e7ed3425460d732c52291e83a012325398747d9091bdb92d0c5c002452' already exists.

      @dmartinol suggests the following fix:

      Since we?re using these in memory implementation that does not support the drop_old flag, I?d suggest to delete the file before the ingest runs again. Hopefully this can fix the issue (together with the other workaround related to this known issue in Haystack)


      Upstream URL: https://github.com/instructlab/instructlab/issues/3062

              rh-ee-bmurdock Bill Murdock
              upstream-sync Upstream Sync
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: