-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.17
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
pod-network-availability tests don't pass on hypershift, at least not ROSA HCP.
Some info here https://redhat-internal.slack.com/archives/C02LM9FABFW/p1719504170756849?thread_ts=1715974242.700929&cid=C02LM9FABFW
Reproducer:
- Ask cluster-bot for an rosa cluster (rosa create 4.15 6h)
- Run some tests like the below, and notice pod-network-availability monitor collection fails
$ TMPDIR=$(mktemp -d) $ cd $TMPDIR $ oc create serviceaccount cni-conformance -n default $ oc adm policy add-cluster-role-to-user cluster-admin -z cni-conformance -n default $ KUBECONFIG=$(pwd)/kubeconfig.yaml oc login --token="$(oc create token cni-conformance)" --server=$(oc config view --minify --output jsonpath="{.clusters[*].cluster.server}") --insecure-skip-tls-verify $ oc adm release info --image-for=tests registry.ci.openshift.org/ocp/release:4.16.0-0.nightly-2024-07-27-075008 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:398a8756d59ff44f730bd2dd6b62e57ab2507b762442aba34b817d36e76f86e2 $ podman run --authfile=$HOME/pull.json -v "$(pwd):/data:z" -w /data --rm -it quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:398a8756d59ff44f730bd2dd6b62e57ab2507b762442aba34b817d36e76f86e2 sh -c "KUBECONFIG=/data/kubeconfig.yaml /usr/bin/openshift-tests run openshift/network/third-party -o /data/results.txt"
Logs:
<testcase name="[sig-network] can collect pod-to-service poller pod logs" time="0"> <failure message=""> 2 pods lacked sampler output: [pod-network-to-service-disruption-poller-7dfc77c96d-5lfkm, pod-network-to-service-disruption-poller-7dfc77c96d-dl5np] </failure> <system-out> 

Logs for -n e2e-pod-network-disruption-test-zvv8h pod/pod-network-to-service-disruption-poller-7dfc77c96d-5lfkm
 Initializing to watch clusterIP 172.30.190.103:80
 Initializing to watch clusterIP 172.30.190.103:80
 Watching configmaps...
 I0627 15:35:34.507299 1 service_controller.go:168] "Starting PollService controller"
 I0627 15:35:34.507468 1 shared_informer.go:311] Waiting for caches to sync for ServicePoller
 I0627 15:35:34.608352 1 shared_informer.go:318] Caches are synced for ServicePoller
 Adding and starting: http://172.30.190.103:80 on node/ip-10-0-1-228.ec2.internal
 Successfully started: http://172.30.190.103:80 on node/ip-10-0-1-228.ec2.internal
 Stopping and removing: 172.30.190.103 for node/ip-10-0-1-228.ec2.internal
 waiting for consumer to finish {Disruption map[backend-disruption-name:pod-to-service-new-connections connection:new disruption:pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103]}...
 {"level":"Info","locator":"backend-disruption-name/pod-to-service-new-connections connection/new disruption/pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103","message":"backend-disruption-name/pod-to-service-new-connections connection/new disruption/pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103 started responding to GET requests over new connections","tempStructuredLocator":{"type":"","keys":null},"tempStructuredMessage":{"reason":"","cause":"","humanMessage":"","annotations":null},"from":"2024-06-27T15:35:34Z","to":"2024-06-27T15:41:25Z"}
 consumer finished {Disruption map[backend-disruption-name:pod-to-service-new-connections connection:new disruption:pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103]}
 waiting for consumer to finish {Disruption map[backend-disruption-name:pod-to-service-reused-connections connection:reused disruption:pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103]}...
 {"level":"Info","locator":"backend-disruption-name/pod-to-service-reused-connections connection/reused disruption/pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103","message":"backend-disruption-name/pod-to-service-reused-connections connection/reused disruption/pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103 started responding to GET requests over reused connections","tempStructuredLocator":{"type":"","keys":null},"tempStructuredMessage":{"reason":"","cause":"","humanMessage":"","annotations":null},"from":"2024-06-27T15:35:34Z","to":"2024-06-27T15:41:25Z"}
 consumer finished {Disruption map[backend-disruption-name:pod-to-service-reused-connections connection:reused disruption:pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103]}
 Stopped all watchers
 E0627 15:41:25.629887 1 disruption_backend_sampler.go:496] not finished writing all samples (1 remaining), but we're told to close
 E0627 15:41:25.629981 1 disruption_backend_sampler.go:496] not finished writing all samples (1 remaining), but we're told to close


 Logs for -n e2e-pod-network-disruption-test-zvv8h pod/pod-network-to-service-disruption-poller-7dfc77c96d-dl5np
 Initializing to watch clusterIP 172.30.190.103:80
 Initializing to watch clusterIP 172.30.190.103:80
 Watching configmaps...
 I0627 15:35:34.166675 1 service_controller.go:168] "Starting PollService controller"
 I0627 15:35:34.166788 1 shared_informer.go:311] Waiting for caches to sync for ServicePoller
 Adding and starting: http://172.30.190.103:80 on node/ip-10-0-1-176.ec2.internal
 Successfully started: http://172.30.190.103:80 on node/ip-10-0-1-176.ec2.internal
 I0627 15:35:34.267207 1 shared_informer.go:318] Caches are synced for ServicePoller
 Stopping and removing: 172.30.190.103 for node/ip-10-0-1-176.ec2.internal
 waiting for consumer to finish {Disruption map[backend-disruption-name:pod-to-service-new-connections connection:new disruption:pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103]}...
 {"level":"Info","locator":"backend-disruption-name/pod-to-service-new-connections connection/new disruption/pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103","message":"backend-disruption-name/pod-to-service-new-connections connection/new disruption/pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103 started responding to GET requests over new connections","tempStructuredLocator":{"type":"","keys":null},"tempStructuredMessage":{"reason":"","cause":"","humanMessage":"","annotations":null},"from":"2024-06-27T15:35:34Z","to":"2024-06-27T15:41:26Z"}
 consumer finished {Disruption map[backend-disruption-name:pod-to-service-new-connections connection:new disruption:pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103]}
 waiting for consumer to finish {Disruption map[backend-disruption-name:pod-to-service-reused-connections connection:reused disruption:pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103]}...
 {"level":"Info","locator":"backend-disruption-name/pod-to-service-reused-connections connection/reused disruption/pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103","message":"backend-disruption-name/pod-to-service-reused-connections connection/reused disruption/pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103 started responding to GET requests over reused connections","tempStructuredLocator":{"type":"","keys":null},"tempStructuredMessage":{"reason":"","cause":"","humanMessage":"","annotations":null},"from":"2024-06-27T15:35:34Z","to":"2024-06-27T15:41:26Z"}
 consumer finished {Disruption map[backend-disruption-name:pod-to-service-reused-connections connection:reused disruption:pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103]}
 Stopped all watchers
 </system-out> </testcase>