Loading...

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17
Component/s: Test Framework
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

pod-network-availability tests don't pass on hypershift, at least not ROSA HCP.

Some info here https://redhat-internal.slack.com/archives/C02LM9FABFW/p1719504170756849?thread_ts=1715974242.700929&cid=C02LM9FABFW

Reproducer:

Ask cluster-bot for an rosa cluster (rosa create 4.15 6h)
Run some tests like the below, and notice pod-network-availability monitor collection fails

$ TMPDIR=$(mktemp -d)
$ cd $TMPDIR
$ oc create serviceaccount cni-conformance -n default
$ oc adm policy add-cluster-role-to-user cluster-admin -z cni-conformance -n default
$ KUBECONFIG=$(pwd)/kubeconfig.yaml oc login --token="$(oc create token cni-conformance)" --server=$(oc config view --minify --output jsonpath="{.clusters[*].cluster.server}") --insecure-skip-tls-verify
$ oc adm release info --image-for=tests registry.ci.openshift.org/ocp/release:4.16.0-0.nightly-2024-07-27-075008
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:398a8756d59ff44f730bd2dd6b62e57ab2507b762442aba34b817d36e76f86e2
$ podman run --authfile=$HOME/pull.json -v "$(pwd):/data:z" -w /data --rm -it quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:398a8756d59ff44f730bd2dd6b62e57ab2507b762442aba34b817d36e76f86e2 sh -c "KUBECONFIG=/data/kubeconfig.yaml /usr/bin/openshift-tests run openshift/network/third-party -o /data/results.txt"

Logs:

<testcase name="[sig-network] can collect pod-to-service poller pod logs" time="0">
    <failure message="">
        2 pods lacked sampler output: [pod-network-to-service-disruption-poller-7dfc77c96d-5lfkm, pod-network-to-service-disruption-poller-7dfc77c96d-dl5np]
    </failure>
    <system-out>
        &#xA;&#xA;Logs for -n e2e-pod-network-disruption-test-zvv8h pod/pod-network-to-service-disruption-poller-7dfc77c96d-5lfkm&#xA;
        Initializing to watch clusterIP 172.30.190.103:80&#xA;
        Initializing to watch clusterIP 172.30.190.103:80&#xA;
        Watching configmaps...&#xA;
        I0627 15:35:34.507299 1 service_controller.go:168] "Starting PollService controller"&#xA;
        I0627 15:35:34.507468 1 shared_informer.go:311] Waiting for caches to sync for ServicePoller&#xA;
        I0627 15:35:34.608352 1 shared_informer.go:318] Caches are synced for ServicePoller&#xA;
        Adding and starting: http://172.30.190.103:80 on node/ip-10-0-1-228.ec2.internal&#xA;
        Successfully started: http://172.30.190.103:80 on node/ip-10-0-1-228.ec2.internal&#xA;
        Stopping and removing: 172.30.190.103 for node/ip-10-0-1-228.ec2.internal&#xA;
        waiting for consumer to finish {Disruption map[backend-disruption-name:pod-to-service-new-connections connection:new disruption:pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103]}...&#xA;
        {"level":"Info","locator":"backend-disruption-name/pod-to-service-new-connections connection/new disruption/pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103","message":"backend-disruption-name/pod-to-service-new-connections connection/new disruption/pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103 started responding to GET requests over new connections","tempStructuredLocator":{"type":"","keys":null},"tempStructuredMessage":{"reason":"","cause":"","humanMessage":"","annotations":null},"from":"2024-06-27T15:35:34Z","to":"2024-06-27T15:41:25Z"}&#xA;
        consumer finished {Disruption map[backend-disruption-name:pod-to-service-new-connections connection:new disruption:pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103]}&#xA;
        waiting for consumer to finish {Disruption map[backend-disruption-name:pod-to-service-reused-connections connection:reused disruption:pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103]}...&#xA;
        {"level":"Info","locator":"backend-disruption-name/pod-to-service-reused-connections connection/reused disruption/pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103","message":"backend-disruption-name/pod-to-service-reused-connections connection/reused disruption/pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103 started responding to GET requests over reused connections","tempStructuredLocator":{"type":"","keys":null},"tempStructuredMessage":{"reason":"","cause":"","humanMessage":"","annotations":null},"from":"2024-06-27T15:35:34Z","to":"2024-06-27T15:41:25Z"}&#xA;
        consumer finished {Disruption map[backend-disruption-name:pod-to-service-reused-connections connection:reused disruption:pod-to-service-to-service-from-node-ip-10-0-1-228.ec2.internal-to-clusterIP-172.30.190.103]}&#xA;
        Stopped all watchers&#xA;
        E0627 15:41:25.629887 1 disruption_backend_sampler.go:496] not finished writing all samples (1 remaining), but we're told to close&#xA;
        E0627 15:41:25.629981 1 disruption_backend_sampler.go:496] not finished writing all samples (1 remaining), but we're told to close&#xA;&#xA;&#xA;
        
        Logs for -n e2e-pod-network-disruption-test-zvv8h pod/pod-network-to-service-disruption-poller-7dfc77c96d-dl5np&#xA;
        Initializing to watch clusterIP 172.30.190.103:80&#xA;
        Initializing to watch clusterIP 172.30.190.103:80&#xA;
        Watching configmaps...&#xA;
        I0627 15:35:34.166675 1 service_controller.go:168] "Starting PollService controller"&#xA;
        I0627 15:35:34.166788 1 shared_informer.go:311] Waiting for caches to sync for ServicePoller&#xA;
        Adding and starting: http://172.30.190.103:80 on node/ip-10-0-1-176.ec2.internal&#xA;
        Successfully started: http://172.30.190.103:80 on node/ip-10-0-1-176.ec2.internal&#xA;
        I0627 15:35:34.267207 1 shared_informer.go:318] Caches are synced for ServicePoller&#xA;
        Stopping and removing: 172.30.190.103 for node/ip-10-0-1-176.ec2.internal&#xA;
        waiting for consumer to finish {Disruption map[backend-disruption-name:pod-to-service-new-connections connection:new disruption:pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103]}...&#xA;
        {"level":"Info","locator":"backend-disruption-name/pod-to-service-new-connections connection/new disruption/pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103","message":"backend-disruption-name/pod-to-service-new-connections connection/new disruption/pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103 started responding to GET requests over new connections","tempStructuredLocator":{"type":"","keys":null},"tempStructuredMessage":{"reason":"","cause":"","humanMessage":"","annotations":null},"from":"2024-06-27T15:35:34Z","to":"2024-06-27T15:41:26Z"}&#xA;
        consumer finished {Disruption map[backend-disruption-name:pod-to-service-new-connections connection:new disruption:pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103]}&#xA;
        waiting for consumer to finish {Disruption map[backend-disruption-name:pod-to-service-reused-connections connection:reused disruption:pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103]}...&#xA;
        {"level":"Info","locator":"backend-disruption-name/pod-to-service-reused-connections connection/reused disruption/pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103","message":"backend-disruption-name/pod-to-service-reused-connections connection/reused disruption/pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103 started responding to GET requests over reused connections","tempStructuredLocator":{"type":"","keys":null},"tempStructuredMessage":{"reason":"","cause":"","humanMessage":"","annotations":null},"from":"2024-06-27T15:35:34Z","to":"2024-06-27T15:41:26Z"}&#xA;
        consumer finished {Disruption map[backend-disruption-name:pod-to-service-reused-connections connection:reused disruption:pod-to-service-to-service-from-node-ip-10-0-1-176.ec2.internal-to-clusterIP-172.30.190.103]}&#xA;
        Stopped all watchers&#xA;
    </system-out>
</testcase>

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide