-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.12.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
*strong text*kubernetes-nmstate runs default GW ping check against default routes in non-default routing table. It seems that kubernetes-nmstate picks the first gateway that it can find for the gateway check. With multiple gateways (such as gateways in vrfs), it might pick that gateway.
The problem is:
a) it's obviously a bit useless to run a ping against every vrf / routing table's gateway, we are only interested in the default gw
b) it actually overwrites the test to the table 0 default gw
c) the test doesn't work as the ping command will use table 0
d) I don't know how to tell ping to use a different table than table 0, I actually think with vrfs it might not be possible to source traffic for ping from the vrf
All of that said, the issue is here:
https://github.com/openshift/kubernetes-nmstate/blob/master/pkg/probe/probes.go#L158
func runPing(_ client.Client) (bool, error) {
defaultGw, err := defaultGw()
if err != nil {
log.Error(err, "failed to retrieve default gw")
return false, nil
}
pingOutput, err := ping(defaultGw)
if err != nil {
log.Error(err, fmt.Sprintf("error pinging default gateway -> output: '%s'", pingOutput))
return false, nil
}
return true, nil
}
And mainly here:
https://github.com/openshift/kubernetes-nmstate/blob/master/pkg/probe/probes.go#L130
func defaultGw() (string, error) {
gjsonCurrentState, err := currentStateAsGJson()
if err != nil {
return "", errors.Wrap(err, "failed retrieving current state to retrieve default gw")
}
defaultGwGjsonPath := "routes.running.#(destination==\"0.0.0.0/0\").next-hop-address"
defaultGw := gjsonCurrentState.Get(defaultGwGjsonPath).String()
if defaultGw == "" {
msg := "default gw missing"
defaultGwLog := log.WithValues("path", defaultGwGjsonPath)
defaultGwLogDebug := defaultGwLog.V(1)
if defaultGwLogDebug.Enabled() {
defaultGwLogDebug.Info(msg, "state", gjsonCurrentState.String())
} else {
defaultGwLog.Info(msg)
}
return "", errors.New(msg)
}
return defaultGw, nil
}
That function above will have to filter for table 0 only when looking for the default gateway.
Here's the nmstate file that reproduces the issue - note that the file can be appied without issues when the default route in table 123 is not configured:
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: br123-br124-vlan-eno8403np1-policy
spec:
nodeSelector:
node-role.kubernetes.io/worker: ""
maxUnavailable: 1
desiredState:
routes:
config:
- destination: 9.0.0.0/24
metric: 150
next-hop-address: 192.168.123.1
next-hop-interface: br123
table-id: 123
- destination: 0.0.0.0/0
metric: 150
next-hop-address: 192.168.123.1
next-hop-interface: br123
table-id: 123
- destination: 9.0.0.0/24
metric: 150
next-hop-address: 192.168.124.1
next-hop-interface: br124
table-id: 124
interfaces:
- name: eno8403np1
type: ethernet
state: up
ipv4:
enabled: false
ipv6:
enabled: false
- name: eno8403np1.123
type: vlan
state: up
vlan:
base-iface: eno8403np1
id: 123
- name: eno8403np1.124
type: vlan
state: up
vlan:
base-iface: eno8403np1
id: 124
- name: br123
description: Linux bridge with eno8403np1 as a port
type: linux-bridge
state: up
ipv4:
dhcp: false
enabled: true
auto-dns: false
auto-gateway: false
address:
- ip: 192.168.123.10
prefix-length: 24
bridge:
options:
stp:
enabled: false
port:
- name: eno8403np1.123
- name: br124
description: Linux bridge with eno8403np1 as a port
type: linux-bridge
state: up
ipv4:
dhcp: false
enabled: true
auto-dns: false
auto-gateway: false
address:
- ip: 192.168.124.10
prefix-length: 24
bridge:
options:
stp:
enabled: false
port:
- name: eno8403np1.124
- name: vrf123
type: vrf
state: up
vrf:
port:
- br123
route-table-id: 123
- name: vrf124
type: vrf
state: up
vrf:
port:
- br124
route-table-id: 124
Here's the error message:
{"level":"info","ts":"2023-03-14T11:37:52.721Z","logger":"probe","msg":"Running 'ping' probe"}
{"level":"error","ts":"2023-03-14T11:37:53.524Z","logger":"probe","msg":"error pinging default gateway -> output: ''","error":"failed running ping probe: cmd output: 'PING 192.168.123.1 (192.168.123.1) 56(84) bytes of data.
From 38.145.41.234 icmp_seq=1 Time to live exceeded
--- 192.168.123.1 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
': exit status 1","errorVerbose":"exit status 1
failed running ping probe: cmd output: 'PING 192.168.123.1 (192.168.123.1) 56(84) bytes of data.
From 38.145.41.234 icmp_seq=1 Time to live exceeded
--- 192.168.123.1 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
'
github.com/nmstate/kubernetes-nmstate/pkg/probe.ping
\t/go/src/github.com/openshift/kubernetes-nmstate/pkg/probe/probes.go:180