-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.14
-
None
-
Important
-
No
-
Auth - Sprint 240, Auth - Sprint 241, Auth - Sprint 242, Auth - Sprint 243, Auth - Sprint 245, Auth - Sprint 249, Auth - Sprint 250
-
7
-
Rejected
-
False
-
Description of problem:
Authenticator Operator leaking route into proxy logs seems not less obtrusive
Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-07-27-223709
How reproducible:
Always
Steps to Reproduce:
1. Successfully launch a 4.14 cluster of QE profile upi-on-baremetal/versioned-installer-openstack-https_proxy using Installer QE's Jenkins installer job.
2. Check the proxy. The Installer QE's Jenkins installer job sets the proxy as below. It uses trustedCA:
$ oc get proxy cluster -o yaml spec: httpProxy: http://<user>:<password>@10.0.152.122:3128 httpsProxy: https://<user>:<password>@10.0.152.122:3130 noProxy: test.no-proxy.com trustedCA: name: user-ca-bundle status: httpProxy: http://<user>:<password>@10.0.152.122:3128 httpsProxy: https://<user>:<password>@10.0.152.122:3130 noProxy: .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.<cluster-name>.qe.devcluster.openshift.com,localhost,test.no-proxy.com
3. Check oauth route:
$ oc get route -n openshift-authentication NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD oauth-openshift oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com oauth-openshift 6443 passthrough/Redirect None $ oc rsh -n openshift-authentication-operator authentication-operator-55dcfd854-rmpqm sh-4.4# nslookup oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com ... Non-authoritative answer: Name: oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com Address: 10.0.176.195
4. Enter into the authentication-operator, check whether connections to the oauth route with proxy and without proxy succeed:
First check the "with proxy" situation, it succeeds:
$ oc rsh -n openshift-authentication-operator authentication-operator-55dcfd854-rmpqm sh-4.4# env | grep -i proxy HTTP_PROXY=http://<user>:<password>@10.0.152.122:3128 NO_PROXY=.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.<cluster-name>.qe.devcluster.openshift.com,localhost,test.no-proxy.com HTTPS_PROXY=https://<user>:<password>@10.0.152.122:3130 sh-4.4# curl -kv https://oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com/healthz * Uses proxy env variable NO_PROXY == '.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.<cluster-name>.qe.devcluster.openshift.com,localhost,test.no-proxy.com' * Uses proxy env variable HTTPS_PROXY == 'https://<user>:<password>@10.0.152.122:3130' * Trying 10.0.152.122... ... * Connected to 10.0.152.122 (10.0.152.122) port 3130 (#0) ... * Proxy certificate: * subject: C=CN; ST=Beijing; L=Beijing; O=OCP; OU=Installer-QE; CN=10.0.152.122 ... * Establish HTTP proxy tunnel to oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com:443 ... > CONNECT oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com:443 HTTP/1.1 ... > Proxy-Connection: Keep-Alive ... * Proxy replied 200 to CONNECT request ... * Connection #0 to host 10.0.152.122 left intact ok
Second check the "without proxy" situation, it also succeeds:
sh-4.4# unset HTTP_PROXY NO_PROXY HTTPS_PROXY sh-4.4# curl -kv https://oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com/healthz * Trying 10.0.176.195... * TCP_NODELAY set * Connected to oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com (10.0.176.195) port 443 (#0) ... * Connection #0 to host oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com left intact ok
5. Check the authentication-operator process's proxy config check frequency:
Due to I can't ssh to the proxy server to check the server's logs, I use below `nsenter` and `netstat` methods to check the proxy config check frequency:
5.1 Get the node which the authentication-operator is on:
$ oc get po -n openshift-authentication-operator -o wide NAME READY STATUS RESTARTS AGE IP NODE ... authentication-operator-55dcfd854-rmpqm 1/1 Running 1 (28h ago) 28h 10.129.0.6 <cluster-name>-jbtql-control-plane-1 ...
5.2 Login to the master node of authentication-operator pod:
$ oc debug -q no/<cluster-name>-jbtql-control-plane-1 sh-4.4# chroot /host sh-5.1# sh-5.1# crictl ps --name=authentication-operator$ # on the master, find the container ID of the authentication-operator pod CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD 5f9684c7ab1d6 5010307ef85648e2387e9e13ebc7c040e832c7bd262529f51c0015b0a50a85f5 6 hours ago Running authentication-operator 1 ad842631a17c3 authentication-operator-55dcfd854-rmpqm sh-5.1# crictl inspect 5f9684c7ab1d6 | grep pid # find the PID of the authentication-operator container "pid": 10066,
5.3 Enter to the network namespace of the authentication-operator process
sh-5.1# nsenter -t 10066 -n /bin/bash # Now we entered to the network namespace of the authentication-operator process. Run below monitor: # while true; do date; netstat --tcp --numeric --program | grep -e State -e "10.0.152.122"; echo; sleep 10; done
The `while` loop script outputs very frequent entries to the proxy server, not as AUTH-363 intends to fix to be 1 hour per once. As above shown, 10.129.0.6 is the authentication-operator pod IP, 10.0.152.122 is the proxy server:
Fri Jul 28 09:40:12 UTC 2023 Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 10.129.0.6:36254 10.0.152.122:3130 TIME_WAIT - Fri Jul 28 09:40:22 UTC 2023 Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 10.129.0.6:35970 10.0.152.122:3130 TIME_WAIT - ... Fri Jul 28 09:41:12 UTC 2023 Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 10.129.0.6:35970 10.0.152.122:3130 TIME_WAIT - Fri Jul 28 09:41:32 UTC 2023 Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 10.129.0.6:36342 10.0.152.122:3130 TIME_WAIT -
Actual results:
First,`oc get proxy cluster -o yaml` shows noProxy includes 10.0.0.0/16, and nslookup oauth route returns 10.0.176.195. 10.0.176.195 falls in the net address 10.0.0.0/16. But the netstat still shows it has frequent entries to the proxy server 10.0.152.122:3130.
Second, the frequency is very high, not the expected 1 hour implemented in AUTH-363.
Expected results:
Given the oauth route's resolved IP "10.0.176.195" falls in the net address "10.0.0.0/16" included in noProxy, `netstat` should show no entries to the proxy server 10.0.152.122:3130, let alone so frequent entries.
Additional info:
My cluster is: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/221575/ . The admin kubeconfig is: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/221575/artifact/workdir/install-dir/auth/kubeconfig . I preserve this cluster 30 hours (max allowed) for Developer to debug.