Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10245

kubernetes-nmstate runs default GW ping check against default routes in non-default routing table

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Undefined
    • None
    • 4.12.z
    • None
    • No
    • False
    • Hide

      None

      Show
      None

    Description

      *strong text*kubernetes-nmstate runs default GW ping check against default routes in non-default routing table. It seems that kubernetes-nmstate picks the first gateway that it can find for the gateway check. With multiple gateways (such as gateways in vrfs), it might pick that gateway.

      The problem is:
      a) it's obviously a bit useless to run a ping against every vrf / routing table's gateway, we are only interested in the default gw
      b) it actually overwrites the test to the table 0 default gw
      c) the test doesn't work as the ping command will use table 0
      d) I don't know how to tell ping to use a different table than table 0, I actually think with vrfs it might not be possible to source traffic for ping from the vrf

      All of that said, the issue is here:
      https://github.com/openshift/kubernetes-nmstate/blob/master/pkg/probe/probes.go#L158

      func runPing(_ client.Client) (bool, error) {
      	defaultGw, err := defaultGw()
      	if err != nil {
      		log.Error(err, "failed to retrieve default gw")
      		return false, nil
      	}
      
      	pingOutput, err := ping(defaultGw)
      	if err != nil {
      		log.Error(err, fmt.Sprintf("error pinging default gateway -> output: '%s'", pingOutput))
      		return false, nil
      	}
      	return true, nil
      }
      

      And mainly here:
      https://github.com/openshift/kubernetes-nmstate/blob/master/pkg/probe/probes.go#L130

      func defaultGw() (string, error) {
      	gjsonCurrentState, err := currentStateAsGJson()
      	if err != nil {
      		return "", errors.Wrap(err, "failed retrieving current state to retrieve default gw")
      	}
      	defaultGwGjsonPath := "routes.running.#(destination==\"0.0.0.0/0\").next-hop-address"
      	defaultGw := gjsonCurrentState.Get(defaultGwGjsonPath).String()
      	if defaultGw == "" {
      		msg := "default gw missing"
      		defaultGwLog := log.WithValues("path", defaultGwGjsonPath)
      		defaultGwLogDebug := defaultGwLog.V(1)
      		if defaultGwLogDebug.Enabled() {
      			defaultGwLogDebug.Info(msg, "state", gjsonCurrentState.String())
      		} else {
      			defaultGwLog.Info(msg)
      		}
      		return "", errors.New(msg)
      	}
      	return defaultGw, nil
      }
      

      That function above will have to filter for table 0 only when looking for the default gateway.

      Here's the nmstate file that reproduces the issue - note that the file can be appied without issues when the default route in table 123 is not configured:

      apiVersion: nmstate.io/v1
      kind: NodeNetworkConfigurationPolicy
      metadata:
        name: br123-br124-vlan-eno8403np1-policy 
      spec:
        nodeSelector: 
          node-role.kubernetes.io/worker: "" 
        maxUnavailable: 1
        desiredState:
          routes:
            config:
              - destination: 9.0.0.0/24
                metric: 150
                next-hop-address: 192.168.123.1
                next-hop-interface: br123
                table-id: 123
              - destination: 0.0.0.0/0
                metric: 150
                next-hop-address: 192.168.123.1
                next-hop-interface: br123
                table-id: 123
              - destination: 9.0.0.0/24
                metric: 150
                next-hop-address: 192.168.124.1
                next-hop-interface: br124
                table-id: 124
          interfaces:
            - name: eno8403np1
              type: ethernet
              state: up
              ipv4:
                enabled: false
              ipv6:
                enabled: false
            - name: eno8403np1.123
              type: vlan
              state: up
              vlan:
                base-iface: eno8403np1
                id: 123
            - name: eno8403np1.124
              type: vlan
              state: up
              vlan:
                base-iface: eno8403np1
                id: 124
            - name: br123
              description: Linux bridge with eno8403np1 as a port 
              type: linux-bridge
              state: up
              ipv4:
                dhcp: false
                enabled: true
                auto-dns: false
                auto-gateway: false
                address:
                  - ip: 192.168.123.10
                    prefix-length: 24
              bridge:
                options:
                  stp:
                    enabled: false
                port:
                  - name: eno8403np1.123
            - name: br124
              description: Linux bridge with eno8403np1 as a port 
              type: linux-bridge
              state: up
              ipv4:
                dhcp: false
                enabled: true
                auto-dns: false
                auto-gateway: false
                address:
                  - ip: 192.168.124.10
                    prefix-length: 24
              bridge:
                options:
                  stp:
                    enabled: false
                port:
                  - name: eno8403np1.124
            - name: vrf123
              type: vrf
              state: up
              vrf:
                port:
                - br123
                route-table-id: 123
            - name: vrf124
              type: vrf
              state: up
              vrf:
                port:
                - br124
                route-table-id: 124
      

      Here's the error message:

      {"level":"info","ts":"2023-03-14T11:37:52.721Z","logger":"probe","msg":"Running 'ping' probe"}
      {"level":"error","ts":"2023-03-14T11:37:53.524Z","logger":"probe","msg":"error pinging default gateway -> output: ''","error":"failed running ping probe: cmd output: 'PING 192.168.123.1 (192.168.123.1) 56(84) bytes of data.
      From 38.145.41.234 icmp_seq=1 Time to live exceeded
      
      --- 192.168.123.1 ping statistics ---
      1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
      
      ': exit status 1","errorVerbose":"exit status 1
      failed running ping probe: cmd output: 'PING 192.168.123.1 (192.168.123.1) 56(84) bytes of data.
      From 38.145.41.234 icmp_seq=1 Time to live exceeded
      
      --- 192.168.123.1 ping statistics ---
      1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
      
      '
      github.com/nmstate/kubernetes-nmstate/pkg/probe.ping
      \t/go/src/github.com/openshift/kubernetes-nmstate/pkg/probe/probes.go:180
      

      Attachments

        Activity

          People

            akaris@redhat.com Andreas Karis
            akaris@redhat.com Andreas Karis
            Qiong Wang Qiong Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: