Loading...

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: 4.10.z
Component/s: Networking / DNS
Labels:
- ServiceDeliveryImpact
- service-delivery

Severity:
Important
Regression:
None
Story Points:
2
Sprint:
Sprint 228
sprint_count:
1
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.10.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

When using cluster scaling and querying a URL in a pod on the side. All the while running some custom watches on endpoints and nodes.

When the nodes scale down, it seems a few seconds before an event marks the node as Not Ready and before the dns-default endpoint is removed from the endpoints list a DNS query can fail.

We wrote some simply watcher (see below for details) to log this and got the following events:

DNS lookup failure:

Tue Oct 18 12:33:23 UTC 2022 - Lookup success                                                                                                                                                                                                                                                                                                      
Tue Oct 18 12:33:28 UTC 2022 - DNS failure                                                                                                                                                                                                                                                                                                         
Tue Oct 18 12:33:41 UTC 2022 - Lookup success

The node was not yet Not Ready and the endpoint was still in the list of endpoints at that time (ntrdy indicates a NotReadyEndpoint):

2022-10-18 12:33:21.712180649 +0000 UTC m=+1047.610174444 - ip-10-0-137-100.ec2.internal -  MemoryPressure - False, DiskPressure - False, PIDPressure - False, Ready - True,
2022-10-18 12:33:39.11806612 +0000 UTC m=+1065.016059955 - ip-10-0-129-193.ec2.internal -  MemoryPressure - Unknown, DiskPressure - Unknown, PIDPressure - Unknown, Ready - Unknown,
2022-10-18 12:33:39.525574893 +0000 UTC m=+1065.423568712 - dns-default rdy: 10.128.0.2 rdy: 10.128.10.4 rdy: 10.128.2.5 rdy: 10.129.0.2 rdy: 10.130.0.16 rdy: 10.130.8.4 rdy: 10.131.0.3 ntrdy: 10.131.8.4
2022-10-18 12:33:39.526424974 +0000 UTC m=+1065.424418833 - dns-default rdy: 10.128.0.2 rdy: 10.128.2.5 rdy: 10.129.0.2 rdy: 10.130.0.16 rdy: 10.130.8.4 rdy: 10.131.0.3 ntrdy: 10.128.10.4 ntrdy: 10.131.8.4
2022-10-18 12:33:39.528532869 +0000 UTC m=+1065.426526744 - ip-10-0-129-193.ec2.internal -  MemoryPressure - Unknown, DiskPressure - Unknown, PIDPressure - Unknown, Ready - Unknown,
2022-10-18 12:33:39.729859144 +0000 UTC m=+1065.627852917 - ip-10-0-150-205.ec2.internal -  MemoryPressure - Unknown, DiskPressure - Unknown, PIDPressure - Unknown, Ready - Unknown,
2022-10-18 12:33:39.936928994 +0000 UTC m=+1065.834922825 - ip-10-0-150-205.ec2.internal -  MemoryPressure - Unknown, DiskPressure - Unknown, PIDPressure - Unknown, Ready - Unknown,
2022-10-18 12:33:44.749587947 +0000 UTC m=+1070.647581767 - ip-10-0-188-175.ec2.internal -  MemoryPressure - Unknown, DiskPressure - Unknown, PIDPressure - Unknown, Ready - Unknown,
2022-10-18 12:33:44.952196646 +0000 UTC m=+1070.850190469 - dns-default rdy: 10.128.0.2 rdy: 10.128.2.5 rdy: 10.129.0.2 rdy: 10.130.0.16 rdy: 10.131.0.3 ntrdy: 10.128.10.4 ntrdy: 10.130.8.4 ntrdy: 10.131.8.4
2022-10-18 12:33:44.954865089 +0000 UTC m=+1070.852858965 - ip-10-0-188-175.ec2.internal -  MemoryPressure - Unknown, DiskPressure - Unknown, PIDPressure - Unknown, Ready - Unknown,
2022-10-18 12:33:45.159460169 +0000 UTC m=+1071.057454007 - ip-10-0-150-205.ec2.internal -  MemoryPressure - Unknown, DiskPressure - Unknown, PIDPressure - Unknown, Ready - Unknown,
2022-10-18 12:33:48.641412229 +0000 UTC m=+1074.539406059 - ip-10-0-188-175.ec2.internal -  MemoryPressure - Unknown, DiskPressure - Unknown, PIDPressure - Unknown, Ready - Unknown,
2022-10-18 12:33:48.846438064 +0000 UTC m=+1074.744431900 - ip-10-0-129-193.ec2.internal -  MemoryPressure - Unknown, DiskPressure - Unknown, PIDPressure - Unknown, Ready - Unknown,
2022-10-18 12:33:54.068542745 +0000 UTC m=+1079.966536563 - ip-10-0-150-205.ec2.internal -  MemoryPressure - Unknown, DiskPressure - Unknown, PIDPressure - Unknown, Ready - Unknown,
2022-10-18 12:34:31.752294563 +0000 UTC m=+1117.650288381 - ip-10-0-253-198.ec2.internal -  MemoryPressure - False, DiskPressure - False, PIDPressure - False, Ready - True,
2022-10-18 12:34:39.531848219 +0000 UTC m=+1125.429842032 - dns-default rdy: 10.128.0.2 rdy: 10.128.2.5 rdy: 10.129.0.2 rdy: 10.130.0.16 rdy: 10.131.0.3 ntrdy: 10.128.10.4 ntrdy: 10.131.8.4
2022-10-18 12:34:39.736866622 +0000 UTC m=+1125.634860439 - dns-default rdy: 10.128.0.2 rdy: 10.128.2.5 rdy: 10.129.0.2 rdy: 10.130.0.16 rdy: 10.131.0.3 ntrdy: 10.128.10.4
2022-10-18 12:34:39.941934912 +0000 UTC m=+1125.839928742 - dns-default rdy: 10.128.0.2 rdy: 10.128.2.5 rdy: 10.129.0.2 rdy: 10.130.0.16 rdy: 10.131.0.3

So we can observe that the node goes into 'Unknown' at 12:33:39, and the endpoint goes into Not Ready soon after.
Not sure if this is a logic problem of draining a node or an issue with the autoscaler at this point in time, but it fixes itself at the next lookup 5 seconds later.

—

Detailed breakdown of how this was reproduced:
1. A cluster with autoscaling enabled is required.
2. Deploy a daemonset that attempt to use DNS / HTTP in a loop, e.g. the following Daemonset was used to test this:


apiVersion: apps/v1
kind: DaemonSet
metadata: 
  name: dns-tester
  labels: 
    app: dns-tester
spec: 
  selector: 
    matchLabels: 
      app: dns-tester
  template: 
    metadata: 
      labels: 
        app: dns-tester
    spec: 
      containers: 
      - name: dns-tester
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:72f2f7e906c321da6d6a00ce610780e8766e8432f7c553c5d03492f65fe5416c
        command: ["/bin/sh", "-c"]
        args: ['while true; do CURL=$(curl redhat.com 2>&1); if [[ "$CURL" == *"not resolve"* ]]; then echo `date` - "DNS failure"; else echo `date` - "Lookup success"; fi;  sleep 5; done']
        resources: 
          limits: 
            cpu: 100m
            memory: 200Mi

3. Run the following go program against the same cluster (this is what watches the node and endpoint events for the dns-default endpoints):

package main

import (
	"context"
	"fmt"
	"path/filepath"
	"sync"
	"time"

	corev1 "k8s.io/api/core/v1"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/client-go/kubernetes"
	_ "k8s.io/client-go/plugin/pkg/client/auth"
	"k8s.io/client-go/tools/clientcmd"
	"k8s.io/client-go/util/homedir"
)

const dnsNamespace = "openshift-dns"
const dnsEndpoint = "dns-default"

func nodeWatch(clientset *kubernetes.Clientset, waitGroup sync.WaitGroup) {
	ctx := context.Background()
	defer waitGroup.Done()

	var nodes = clientset.CoreV1().Nodes()
	watcher, err := nodes.Watch(ctx, metav1.ListOptions{})
	if err != nil {
		panic(err.Error())
	}
	ch := watcher.ResultChan()

	for {
		event := <-ch
		node, ok := event.Object.(*corev1.Node)
		if !ok {
			fmt.Printf("%v", event)
			panic("Could not cast to nodes")
		}
		fmt.Printf("%v - %s - ", time.Now(), node.Name)
		for _, condition := range node.Status.Conditions {
			fmt.Printf(" %v - %v,", condition.Type, condition.Status)
		}
		fmt.Println()
	}
}

func dnsWatch(clientset *kubernetes.Clientset, waitGroup sync.WaitGroup) {
	ctx := context.Background()
	defer waitGroup.Done()
	var api = clientset.CoreV1().Endpoints(dnsNamespace)
	watcher, err := api.Watch(ctx, metav1.ListOptions{})
	if err != nil {
		panic(err.Error())
	}
	ch := watcher.ResultChan()
	for {
		event := <-ch
		endpoints, ok := event.Object.(*corev1.Endpoints)
		if !ok {
			fmt.Printf("%v", event)
			panic("Could not cast to Endpoint")
		}
		fmt.Printf("%v - %v", time.Now(), endpoints.ObjectMeta.Name)
		for _, endpoint := range endpoints.Subsets {
			for _, address := range endpoint.Addresses {
				fmt.Printf(" rdy: %v", address.IP)
			}
			for _, address := range endpoint.NotReadyAddresses {
				fmt.Printf(" ntrdy: %v", address.IP)
			}
		}
		fmt.Println()
	}
}

func main() {
	// AUTHENTICATE
	var home = homedir.HomeDir()
	var kubeconfig = filepath.Join(home, ".kube", "config")
	config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
	if err != nil {
		panic(err.Error())
	}
	clientset, err := kubernetes.NewForConfig(config)
	if err != nil {
		panic(err.Error())
	}
	wg := sync.WaitGroup{}
	wg.Add(2)
	go dnsWatch(clientset, wg)
	go nodeWatch(clientset, wg)
	wg.Wait()
}

4. Create simulated pressure on the nodes to force a scaleup - e.g. use the following deployment:

apiVersion: apps/v1
kind: Deployment
metadata: 
  name: resource-eater
spec: 
  replicas: 4
  selector: 
    matchLabels: 
      app: resource-eater
  template: 
    metadata: 
      labels: 
        app: resource-eater
    spec: 
      containers: 
      - name: resource-eater
        image: busybox:latest
        command: ["/bin/sh", "-c"]
        args: ["sleep 3600"]
        resources: 
          requests: 
            memory: "8Gi"
            cpu: "1000m"
      affinity: 
        podAntiAffinity: 
          requiredDuringSchedulingIgnoredDuringExecution: 
          - labelSelector: 
              matchExpressions: 
              - key: app
                operator: In
                values: 
                - store
            topologyKey: "kubernetes.io/hostname"

5. Wait for the scale up to happen.
6. Delete the deployment that created the node pressure, so the scale down can happen (this can easily take 15 minutes).
7. Observe the events in the watcher program and the logs for the daemonset - this should show the same behavior as detailed above.

we found a few logged bugs that seemed related to this issue affecting clusters on 4.8 through 4.10. Those bugs are as follows:

https://issues.redhat.com/browse/OCPBUGS-647
https://issues.redhat.com/browse/OCPBUGS-488
https://bugzilla.redhat.com/show_bug.cgi?id=2061244

Using the boave mentioned steps, we have been able to reliably reproduce the issue of DNS failures during autoscale-down in 4.10 clusters.

depends on

OCPBUGS-4269 The cluster-autoscaler.kubernetes.io/enable-ds-eviction annotation is missing from DNS pods and can cause disruption for tainted nodes during node scale-down

Closed

links to

BZ 2061244: OpenShift DNS doesn't work on node using taints feature

openshift/cluster-dns-operator#350: OCPBUGS-2602: Allow autoscaler to evict DNS pods

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates