Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-116336

System Runner Observability Stack: Alloy to Loki

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • rteval
    • None
    • 1
    • rhel-kernel-rts-time
    • CK-2025-wk37
    • 5
    • False
    • Hide

      None

      Show
      None
    • None

      Title: RT/HPC observability ingest: minimal-label journald + packed JSON timing; offset scripts; dashboard refactor

      Summary
      Implemented a low-overhead log ingest pipeline and timing primitives suitable for PREEMPT_RT/HPC:

      • Ingest: journald/CRI to Loki with minimal labels (host, boot_id, transport, severity; optional unit/app/container).
      • Packed JSON (per entry): boot_epoch_ns, kernel_offset_ns, and (for kernel lines) src_mono_us embedded alongside the log line.
      • Offsets:
        • BOOT_EPOCH_NS computed once at startup (journald receipt pairs, median).
        • KERNEL_OFFSET_NS recomputed periodically (trimmed median over last-N kernel entries; reuse previous if quiet).
      • Performance: bounded journal scans, tiny per-line work; friendly to RT nodes.

      Current status

      • Alloy pipeline running and emitting packed fields.
      • Partial Grafana dashboard working; needs refactor to read packed JSON instead of legacy fields.

      What’s left (concise)

      • Refactor Grafana panels/queries to:
        | json | line_format "{{.message"}} and use fields boot_epoch_ns, kernel_offset_ns, src_mono_us.
      • Add panels for: offset drift over time, kernel/userspace correlation examples, and cross-boot ordering via boot_epoch_ns.
      • Validate CRI vs journald source selection per node (no duplicates).
      • (RT nodes) pin collectors to the monitoring core; document CPU/IO limits.

      Acceptance criteria

      • Every ingested line carries packed JSON with boot_epoch_ns and kernel_offset_ns; kernel lines include src_mono_us.
      • Ingest overhead remains low (collector CPU low single-digit % on monitoring core).
      • No label/cardinality regressions.

              rhn-gps-chwhite William White
              rhn-gps-chwhite William White
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: