-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.12.z, 4.10.z, 4.7.z, 4.14.z
-
No
-
SDN Sprint 248, SDN Sprint 249, SDN Sprint 250
-
3
-
False
-
Description of problem:
This issue is related to ICNI2/multi-egress OVN feature.
During the PERFSCALE-2388 testing some intermittent BFD flapping between the emulated SPK pods and the application workers has been noticed.
The issue can be reproduced with only 2 worker nodes and appears when each worker handles more than 20/30 BFD sessions.
Talking to dceara@redhat.com we suspect the issue to be OVN CoPP (control plane protection).
The idea is to rate limit control packets hitting ovn-controller.
It's disabled by default and ovnkube should configure that with legitimate value for different protocols. BFD is one of them.
This is for ALL protocols:
https://github.com/ovn-org/ovn-kubernetes/blob/1bd71cb0dd57740a23c504ede36fee0f85ae3bc5/go-controller/pkg/ovn/copp.go#L57-L60
band := &nbdb.MeterBand{ Action: types.MeterAction, Rate: int(25), // hard-coding for now. TODO(tssurya): make this configurable if needed } for _, protocol := range defaultProtocolNames { // format: <OVNSupportedProtocolName>-rate-limiter meterName := getMeterNameForProtocol(protocol) meterNames[protocol] = meterName meter := &nbdb.Meter{ Name: meterName, Fair: &meterFairness, Unit: types.PacketsPerSecond, } ops, err = libovsdbops.CreateOrUpdateMeterOps(nbClient, ops, meter, []*nbdb.MeterBand{band}, &meter.Bands, &meter.Fair, &meter.Unit) if err != nil { return "", fmt.Errorf("can't create meter %v: %v", meter, err) } }
So that means that ovn-controller will get at most 25 BFD pkt-ins per second.
Full tshooting document can be found here
How reproducible:
Through ICNI2/multi egress annotations (https://docs.openshift.com/container-platform/4.14/networking/ovn_kubernetes_network_provider/configuring-secondary-external-gateway.html) try to establish more than 30 BFD sessions per worker.