Uploaded image for project: 'Migration Toolkit for Virtualization'
  1. Migration Toolkit for Virtualization
  2. MTV-3890

[DOC] Generate metrics and forward as telemetry data - Tech Preview

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Major Major
    • 2.11.0
    • None
    • Documentation
    • None
    • Product / Portfolio Work
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected

      This ticket is for documenting the new counter metric, mtv_migration_plan_vms_total, which will be used to gather and forward telemetry data about Virtual Machine (VM) migrations facilitated by the Migration Toolkit for Virtualization (MTV).

      The goal of this metric is to provide a concise way of counting the total number of VM migrations on the cluster based on their key attributes, which currently relies on unreliable customer input.

      Metric to Document

       

      Metric Name Type Purpose Labels
      mtv_migration_plan_vms_total Counter Counts the total number of migration plan VMs. vm_status, provider, mode, target

      Label Details and Possible Values

      The documentation should clearly define each label and list the possible values it can take.

      • vm_status
      • Description: The final state of the VM after the migration attempt.
      • Possible Values:
        • Succeeded
        • Failed
      • provider
        • Description: The source virtualization platform the VM was migrated from.
        • Possible Values (Current):
          • vsphere
          • ova
          • Other currently supported providers (e.g., OpenStack, oVirt, OpenShift) should be included.
      • Future Values (Note for Inclusion):
        • awsec2 (Not yet implemented, but planned)
      • mode
        • Description: The type of migration performed.
        • Possible Values (Current):
          • Cold: The source VM is shut down while data is copied.
          • Warm: Most data is copied while the VM is running (pre-copy stage), and the VM is shut down only for the cutover stage.
        • Future Values (Note for Inclusion):
          • RCM (Not yet implemented, but planned)
      • target
        • Description: The destination cluster for the migration.
        • Possible Values:
          • local
          • remote

      Example Use Cases

      Please include clear examples to illustrate how this counter aggregates data.

      • Example 1: Successful Warm Migrations
        • If the metric reports: succeeded, vsphere, warm, local = 2
        • This means: 2 VMs were successfully migrated using warm migration from a vSphere source provider to a local cluster.
      • Example 2: Failed Cold Migrations
        • If the metric reports: failed, vsphere, cold, local = 3
        • This means: 3 VMs failed cold migration from a vSphere source provider to a local cluster.

      Content Journey for MTV Telemetry Metrics Documentation

       

      Stage User Goal Documentation Content & Focus
      Discover "Is there a way to track how my migrations are performing?"  
      High-level summary and value proposition. Focus on why this feature exists (currently, there is no concise or reliable way to gather migration metrics) and what it provides (data on VM/migration counts, results, providers, OSes).
       
       
       
       
      Learn "What data is being collected and how do I access it?"  
      Conceptual guide and metric definitions. Introduce the main metrics (e.g., mtv_migrations_status_total, mtv_migration_plan_vms_total, mtv_migration_vm_oses_total) and explain the labels (status, provider, mode, target, OS). Explain the process—gathering and forwarding information as telemetry data.
       
       
       
       
       
       
      Try "How do I verify the metrics are working correctly?" Hands-on verification and testing guide. Provide clear steps and example queries for Prometheus or monitoring tools. Show how to check if the metrics correspond to reality. Include examples of the counter and gauge metrics in action.
       
       
       
       
       
      Adopt "How do I integrate this data into my overall operational monitoring?"  
      Deployment and integration instructions. Detail the configuration needed on the MTV side. Provide instructions for setting up dashboards and alerts using the documented metrics and labels. Focus on using the data to address the original motivation: getting reliable insight into MTV usage.
       
       
       
      Expand "What deeper insights can I get, and what is coming next?"  
      Advanced usage, future capabilities, and data analysis examples. Discuss using metrics like mtv_migration_duration_seconds (histogram) for deeper analysis of performance. Document the currently implemented metrics alongside planned future metrics (e.g., network throughput, disk I/O).
       
       
       
       
       
       

      Jobs to be Done" (JTBD) statement:

       

      "When I am running VM migrations with MTV, I want to automatically and reliably gather comprehensive data on the status, type, and outcomes of the migrations and individual VMs, so that I can report on usage, identify trends, and address failures without relying on manual customer input."

      Breakdown

      • Job (What the user wants to accomplish): Gather comprehensive data on migration status, type, and outcomes.
         
      • Circumstance (The situation or context): When running VM migrations with MTV.
         
      • Motivation/Need (Why they want it): To report on usage, identify trends, and address failures.
         
      • Desired Outcome (What success looks like): Accessing this data automatically and reliably, without relying on manual customer input.

      Personas:

      1. Product Manager / Business Analyst

      • As a Product Manager, I want to perform trend analysis on migration success rates, provider usage, and popular OSes , to achieve the goal of understanding MTV adoption, identifying pain points, and making data-driven decisions for feature prioritization.

      2. Reliability Engineer (SRE) / Operations Specialist

      • As a Reliability Engineer, I want to perform real-time monitoring of migration status, duration, and failures by plan and VM, to achieve the goal of proactively detecting and alerting on widespread issues, ensuring migration stability, and measuring the Mean Time To Recovery (MTTR).

      3. Support Engineer / QE Specialist

      • As a Support Engineer, I want to perform detailed lookups of metrics (like data transferred and duration) associated with a specific failed migration plan ID , to achieve the goal of quickly diagnosing specific customer issues and verifying the root cause of migration failures or cancellations.

              rhn-support-anarnold A Arnold
              rhn-support-anarnold A Arnold
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: