Details
-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.12.z
-
None
-
No
-
False
-
Description
*strong text*kubernetes-nmstate runs default GW ping check against default routes in non-default routing table. It seems that kubernetes-nmstate picks the first gateway that it can find for the gateway check. With multiple gateways (such as gateways in vrfs), it might pick that gateway.
The problem is:
a) it's obviously a bit useless to run a ping against every vrf / routing table's gateway, we are only interested in the default gw
b) it actually overwrites the test to the table 0 default gw
c) the test doesn't work as the ping command will use table 0
d) I don't know how to tell ping to use a different table than table 0, I actually think with vrfs it might not be possible to source traffic for ping from the vrf
All of that said, the issue is here:
https://github.com/openshift/kubernetes-nmstate/blob/master/pkg/probe/probes.go#L158
func runPing(_ client.Client) (bool, error) { defaultGw, err := defaultGw() if err != nil { log.Error(err, "failed to retrieve default gw") return false, nil } pingOutput, err := ping(defaultGw) if err != nil { log.Error(err, fmt.Sprintf("error pinging default gateway -> output: '%s'", pingOutput)) return false, nil } return true, nil }
And mainly here:
https://github.com/openshift/kubernetes-nmstate/blob/master/pkg/probe/probes.go#L130
func defaultGw() (string, error) { gjsonCurrentState, err := currentStateAsGJson() if err != nil { return "", errors.Wrap(err, "failed retrieving current state to retrieve default gw") } defaultGwGjsonPath := "routes.running.#(destination==\"0.0.0.0/0\").next-hop-address" defaultGw := gjsonCurrentState.Get(defaultGwGjsonPath).String() if defaultGw == "" { msg := "default gw missing" defaultGwLog := log.WithValues("path", defaultGwGjsonPath) defaultGwLogDebug := defaultGwLog.V(1) if defaultGwLogDebug.Enabled() { defaultGwLogDebug.Info(msg, "state", gjsonCurrentState.String()) } else { defaultGwLog.Info(msg) } return "", errors.New(msg) } return defaultGw, nil }
That function above will have to filter for table 0 only when looking for the default gateway.
Here's the nmstate file that reproduces the issue - note that the file can be appied without issues when the default route in table 123 is not configured:
apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: br123-br124-vlan-eno8403np1-policy spec: nodeSelector: node-role.kubernetes.io/worker: "" maxUnavailable: 1 desiredState: routes: config: - destination: 9.0.0.0/24 metric: 150 next-hop-address: 192.168.123.1 next-hop-interface: br123 table-id: 123 - destination: 0.0.0.0/0 metric: 150 next-hop-address: 192.168.123.1 next-hop-interface: br123 table-id: 123 - destination: 9.0.0.0/24 metric: 150 next-hop-address: 192.168.124.1 next-hop-interface: br124 table-id: 124 interfaces: - name: eno8403np1 type: ethernet state: up ipv4: enabled: false ipv6: enabled: false - name: eno8403np1.123 type: vlan state: up vlan: base-iface: eno8403np1 id: 123 - name: eno8403np1.124 type: vlan state: up vlan: base-iface: eno8403np1 id: 124 - name: br123 description: Linux bridge with eno8403np1 as a port type: linux-bridge state: up ipv4: dhcp: false enabled: true auto-dns: false auto-gateway: false address: - ip: 192.168.123.10 prefix-length: 24 bridge: options: stp: enabled: false port: - name: eno8403np1.123 - name: br124 description: Linux bridge with eno8403np1 as a port type: linux-bridge state: up ipv4: dhcp: false enabled: true auto-dns: false auto-gateway: false address: - ip: 192.168.124.10 prefix-length: 24 bridge: options: stp: enabled: false port: - name: eno8403np1.124 - name: vrf123 type: vrf state: up vrf: port: - br123 route-table-id: 123 - name: vrf124 type: vrf state: up vrf: port: - br124 route-table-id: 124
Here's the error message:
{"level":"info","ts":"2023-03-14T11:37:52.721Z","logger":"probe","msg":"Running 'ping' probe"} {"level":"error","ts":"2023-03-14T11:37:53.524Z","logger":"probe","msg":"error pinging default gateway -> output: ''","error":"failed running ping probe: cmd output: 'PING 192.168.123.1 (192.168.123.1) 56(84) bytes of data. From 38.145.41.234 icmp_seq=1 Time to live exceeded --- 192.168.123.1 ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms ': exit status 1","errorVerbose":"exit status 1 failed running ping probe: cmd output: 'PING 192.168.123.1 (192.168.123.1) 56(84) bytes of data. From 38.145.41.234 icmp_seq=1 Time to live exceeded --- 192.168.123.1 ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms ' github.com/nmstate/kubernetes-nmstate/pkg/probe.ping \t/go/src/github.com/openshift/kubernetes-nmstate/pkg/probe/probes.go:180