Uploaded image for project: 'OpenShift SDN'
  1. OpenShift SDN
  2. SDN-3450

Understand if we can detect vswitchd cpu starvation using existing metric

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • OVN Kubernetes
    • 0
    • 0

      We need to understand under what conditions metric ovs_vswitchd_dp_flows_lookup_lost may increment.

      Currently, we understand that if ovs-vswitchd is cpu starved and incoming packet sizes are large, then there is an increased likelihood that this metric may increment.

      A test needs to be conducted:

      • Provision a worker node instance with the smallest CPU resources possible for an OCP node on a cloud provider
      • Fill the node with  cpu intensive workloads 
      • Begin sending jumbo frames to the node (up to 9k?). Figure out the size that wont get fragmented.

       

      Please document your tests to get this metric to increment.

      If you cannot get it to increment, then set ovs-vswitchd setting other_config:flow-limit to 0 and retry.

       

      Understanding this metric (and follow on alert) will help highlight when customers worker nodes are overloaded and networking is degraded impacting the user exp.

              Unassigned Unassigned
              mkennell@redhat.com Martin Kennelly
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: