Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-8853

HCP KubeVirt CSI Driver – Expose Hotplug Latency / Duration in Logs and Metrics

XMLWordPrintable

    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      1. Proposed title of this feature request

      Expose hotplug operation timing (latency/duration) in KubeVirt CSI driver logs and metrics to diagnose volume attach delays.

      2. What is the nature and description of the request?

      Nature: Observability Enhancement (Logging and Prometheus Metrics).

      Description: Currently, the KubeVirt CSI driver orchestrates the hotplug flow during ControllerPublishVolume by attaching a DataVolume to a VirtualMachineInstance (VMI) and polling until the volume is "Ready." However, there is no structured timing data exposed during this sequence. Operators cannot distinguish between time spent on API Rate Limiting, KubeVirt Control Plane reconciliation, or Backend Storage staging.

      Requested Change:

        Structured Observability Logs: Inject timestamped log lines at critical handoff points in pkg/service/controller.go or equivalent:

      •  Hotplug Start: When the patch/update to the VMI spec is initiated.
      •  Wait Initiation: When the driver enters the polling loop for volumeStatus.
      • Hotplug Success: When the volume phase reaches Ready in the VMI status.
      • Timeout/Failure: Explicitly log the elapsed duration if the 120s (or configured) deadline is hit.

       Prometheus: Introduce a new histogram metric:

      • kubevirt_csi_hotplug_duration_seconds: Labeled by storage_class and phase (Total, API_Update, VMI_Wait).

       

      3. Why does the customer need this? (List the business requirements here)

      • Mean Time to Detection (MTTD): In environments like Hypershift (HCP), a 6-minute+ attachment delay is currently a "black box." Operators cannot see if the delay is due to the CSI driver being throttled (API level) or the virt-handler being slow to plug the device (Node level).
      • Support Efficiency: Provides a clear "paper trail" in logs to resolve disputes between Storage Vendors (backend latency) and Platform Teams (KubeVirt latency).

      4. List any affected packages or components.

      HCP KubeVirt CSI Driver

              rh_pelauter@redhat.com Peter Lauterbach
              rhn-support-dpateriy Divyam Pateriya
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                None
                None