Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17022

[AUTH-391] Authentication Operator leaking route into proxy logs is not less obtrusive and does not honour the net address in noProxy

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.14
    • apiserver-auth
    • None
    • Important
    • No
    • Auth - Sprint 240, Auth - Sprint 241, Auth - Sprint 242, Auth - Sprint 243, Auth - Sprint 245, Auth - Sprint 249, Auth - Sprint 250
    • 7
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Authenticator Operator leaking route into proxy logs seems not less obtrusive

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-07-27-223709

      How reproducible:

      Always

      Steps to Reproduce:
      1. Successfully launch a 4.14 cluster of QE profile upi-on-baremetal/versioned-installer-openstack-https_proxy using Installer QE's Jenkins installer job.
      2. Check the proxy. The Installer QE's Jenkins installer job sets the proxy as below. It uses trustedCA:

      $ oc get proxy cluster -o yaml
      spec:
        httpProxy: http://<user>:<password>@10.0.152.122:3128
        httpsProxy: https://<user>:<password>@10.0.152.122:3130
        noProxy: test.no-proxy.com
        trustedCA:
          name: user-ca-bundle
      status:
        httpProxy: http://<user>:<password>@10.0.152.122:3128
        httpsProxy: https://<user>:<password>@10.0.152.122:3130
        noProxy: .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.<cluster-name>.qe.devcluster.openshift.com,localhost,test.no-proxy.com
      

      3. Check oauth route:

      $ oc get route -n openshift-authentication
      NAME              HOST/PORT                                                         PATH   SERVICES          PORT   TERMINATION            WILDCARD
      oauth-openshift   oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com          oauth-openshift   6443   passthrough/Redirect   None
      $ oc rsh -n openshift-authentication-operator authentication-operator-55dcfd854-rmpqm
      sh-4.4# nslookup oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com
      ...
      Non-authoritative answer:
      Name:   oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com
      Address: 10.0.176.195
      

      4. Enter into the authentication-operator, check whether connections to the oauth route with proxy and without proxy succeed:
      First check the "with proxy" situation, it succeeds:

      $ oc rsh -n openshift-authentication-operator authentication-operator-55dcfd854-rmpqm
      sh-4.4# env | grep -i proxy
      HTTP_PROXY=http://<user>:<password>@10.0.152.122:3128
      NO_PROXY=.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.<cluster-name>.qe.devcluster.openshift.com,localhost,test.no-proxy.com
      HTTPS_PROXY=https://<user>:<password>@10.0.152.122:3130
      sh-4.4# curl -kv https://oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com/healthz
      * Uses proxy env variable NO_PROXY == '.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.<cluster-name>.qe.devcluster.openshift.com,localhost,test.no-proxy.com'
      * Uses proxy env variable HTTPS_PROXY == 'https://<user>:<password>@10.0.152.122:3130'
      *   Trying 10.0.152.122...
      ...
      * Connected to 10.0.152.122 (10.0.152.122) port 3130 (#0)
      ...
      * Proxy certificate:
      *  subject: C=CN; ST=Beijing; L=Beijing; O=OCP; OU=Installer-QE; CN=10.0.152.122
      ...
      * Establish HTTP proxy tunnel to oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com:443
      ...
      > CONNECT oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com:443 HTTP/1.1
      ...
      > Proxy-Connection: Keep-Alive
      ...
      * Proxy replied 200 to CONNECT request
      ...
      * Connection #0 to host 10.0.152.122 left intact
      ok
      

      Second check the "without proxy" situation, it also succeeds:

      sh-4.4# unset HTTP_PROXY NO_PROXY HTTPS_PROXY
      sh-4.4# curl -kv https://oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com/healthz
      *   Trying 10.0.176.195...
      * TCP_NODELAY set
      * Connected to oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com (10.0.176.195) port 443 (#0)
      ...
      * Connection #0 to host oauth-openshift.apps.<cluster-name>.qe.devcluster.openshift.com left intact
      ok
      

      5. Check the authentication-operator process's proxy config check frequency:
      Due to I can't ssh to the proxy server to check the server's logs, I use below `nsenter` and `netstat` methods to check the proxy config check frequency:
      5.1 Get the node which the authentication-operator is on:

      $ oc get po -n openshift-authentication-operator -o wide
      NAME                                      READY   STATUS    RESTARTS      AGE   IP           NODE ...
      authentication-operator-55dcfd854-rmpqm   1/1     Running   1 (28h ago)   28h   10.129.0.6   <cluster-name>-jbtql-control-plane-1 ...
      

      5.2 Login to the master node of authentication-operator pod:

      $ oc debug -q no/<cluster-name>-jbtql-control-plane-1
      sh-4.4# chroot /host
      sh-5.1#
      sh-5.1# crictl ps --name=authentication-operator$ # on the master, find the container ID of the authentication-operator pod
      CONTAINER           IMAGE                                                              CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
      5f9684c7ab1d6       5010307ef85648e2387e9e13ebc7c040e832c7bd262529f51c0015b0a50a85f5   6 hours ago         Running             authentication-operator   1                   ad842631a17c3       authentication-operator-55dcfd854-rmpqm
      sh-5.1# crictl inspect 5f9684c7ab1d6 | grep pid # find the PID of the authentication-operator container
          "pid": 10066,
      

      5.3 Enter to the network namespace of the authentication-operator process

      sh-5.1# nsenter -t 10066 -n /bin/bash
      #
      
      Now we entered to the network namespace of the authentication-operator process. Run below monitor:
      # while true; do date; netstat --tcp --numeric --program | grep -e State -e "10.0.152.122"; echo; sleep 10; done
      

      The `while` loop script outputs very frequent entries to the proxy server, not as AUTH-363 intends to fix to be 1 hour per once. As above shown, 10.129.0.6 is the authentication-operator pod IP, 10.0.152.122 is the proxy server:

      Fri Jul 28 09:40:12 UTC 2023
      Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
      tcp        0      0 10.129.0.6:36254        10.0.152.122:3130       TIME_WAIT   -
      
      Fri Jul 28 09:40:22 UTC 2023
      Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
      tcp        0      0 10.129.0.6:35970        10.0.152.122:3130       TIME_WAIT   -
      ...
      Fri Jul 28 09:41:12 UTC 2023
      Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
      tcp        0      0 10.129.0.6:35970        10.0.152.122:3130       TIME_WAIT   -                   
      
      Fri Jul 28 09:41:32 UTC 2023
      Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
      tcp        0      0 10.129.0.6:36342        10.0.152.122:3130       TIME_WAIT   -
      

       

      Actual results:
      First,`oc get proxy cluster -o yaml` shows noProxy includes 10.0.0.0/16, and nslookup oauth route returns 10.0.176.195. 10.0.176.195 falls in the net address 10.0.0.0/16. But the netstat still shows it has frequent entries to the proxy server 10.0.152.122:3130.
      Second, the frequency is very high, not the expected 1 hour implemented in AUTH-363.

       

      Expected results:
      Given the oauth route's resolved IP "10.0.176.195" falls in the net address "10.0.0.0/16" included in noProxy, `netstat` should show no entries to the proxy server 10.0.152.122:3130, let alone so frequent entries.

       

      Additional info:

      My cluster is: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/221575/ . The admin kubeconfig is: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/221575/artifact/workdir/install-dir/auth/kubeconfig . I preserve this cluster 30 hours (max allowed) for Developer to debug.

              rh-ee-irinis Ilias Rinis
              xxia-1 Xingxing Xia
              Xingxing Xia Xingxing Xia
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: