-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.12.z, 4.10.z, 4.7.z, 4.14.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
CORENET Sprint 269
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
This issue is related to ICNI2/multi-egress OVN feature.
During the PERFSCALE-2388 testing some intermittent BFD flapping between the emulated SPK pods and the application workers has been noticed.
The issue can be reproduced with only 2 worker nodes and appears when each worker handles more than 20/30 BFD sessions.
Talking to dceara@redhat.com we suspect the issue to be OVN CoPP (control plane protection).
The idea is to rate limit control packets hitting ovn-controller.
It's disabled by default and ovnkube should configure that with legitimate value for different protocols. BFD is one of them.
This is for ALL protocols:
https://github.com/ovn-org/ovn-kubernetes/blob/1bd71cb0dd57740a23c504ede36fee0f85ae3bc5/go-controller/pkg/ovn/copp.go#L57-L60
band := &nbdb.MeterBand{
Action: types.MeterAction,
Rate: int(25), // hard-coding for now. TODO(tssurya): make this configurable if needed
}
for _, protocol := range defaultProtocolNames {
// format: <OVNSupportedProtocolName>-rate-limiter
meterName := getMeterNameForProtocol(protocol)
meterNames[protocol] = meterName
meter := &nbdb.Meter{
Name: meterName,
Fair: &meterFairness,
Unit: types.PacketsPerSecond,
}
ops, err = libovsdbops.CreateOrUpdateMeterOps(nbClient, ops, meter, []*nbdb.MeterBand{band},
&meter.Bands, &meter.Fair, &meter.Unit)
if err != nil {
return "", fmt.Errorf("can't create meter %v: %v", meter, err)
}
}
So that means that ovn-controller will get at most 25 BFD pkt-ins per second.
Full tshooting document can be found here
How reproducible:
Through ICNI2/multi egress annotations (https://docs.openshift.com/container-platform/4.14/networking/ovn_kubernetes_network_provider/configuring-secondary-external-gateway.html) try to establish more than 30 BFD sessions per worker.