Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-6447

RHODS Dashboard oauth-proxy crashing with 1000 users

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Do
    • Major
    • None
    • None
    • UI
    • RHODS 1.23

    Description

      Description of problem:

      UPDATE

      the complete fix is going to be tracked by RHODS-7505

      When running the scale test bare-metal with 1000 users, I observe some instabilities of the oauth-proxy container:

      The previous logs of the oauth-proxy containers are full of these messages:

      $ oc logs rhods-dashboard-659c6bfddf-mdwzg --previous -c oauth-proxy
      2023/01/23 09:32:42 reverseproxy.go:485: http: proxy error: context canceled
      2023/01/23 09:32:44 reverseproxy.go:485: http: proxy error: context canceled
      2023/01/23 09:32:44 server.go:3120: http: TLS handshake error from 10.130.0.1:43350: write tcp 10.130.14.234:8443->10.130.0.1:43350: write: broken pipe
      2023/01/23 09:32:44 provider.go:587: Performing OAuth discovery against https://172.30.0.1/.well-known/oauth-authorization-server
      2023/01/23 09:32:44 server.go:3120: http: TLS handshake error from 10.130.0.1:43372: write tcp 10.130.14.234:8443->10.130.0.1:43372: write: broken pipe
      

      Prerequisites (if any, like setup, operators/versions):

      Steps to Reproduce

      1. Run the RHODS-notebook scale test with 1000 users (3s delay) trying to access the Dashboard

      Actual results:

      many user tests fail with this error:

      and messages like this one in the browser logs:

       {'level': 'SEVERE', 'message': 'https://rhods-dashboard-redhat-ods-applications.apps.bm.example.com/api/segment-key - Failed to load resource: the server responded with a status of 503 (Service Unavailable)', 'source': 'network', 'timestamp': 1674466086292}]
      

      or this one:

      Expected results:

      Reproducibility (Always/Intermittent/Only Once):

      Build Details:

      rhods-operator.1.21.0-20
      

      Workaround:

      Additional info:

      Scale test results

      Attachments

        1. image-2023-01-23-11-20-46-107.png
          image-2023-01-23-11-20-46-107.png
          25 kB
        2. image-2023-01-23-11-25-45-334.png
          image-2023-01-23-11-25-45-334.png
          104 kB
        3. image-2023-01-23-11-27-59-255.png
          image-2023-01-23-11-27-59-255.png
          83 kB
        4. screenshot-1.png
          screenshot-1.png
          105 kB
        5. screenshot-10.png
          screenshot-10.png
          56 kB
        6. screenshot-2.png
          screenshot-2.png
          141 kB
        7. screenshot-3.png
          screenshot-3.png
          140 kB
        8. screenshot-4.png
          screenshot-4.png
          111 kB
        9. screenshot-5.png
          screenshot-5.png
          132 kB
        10. screenshot-6.png
          screenshot-6.png
          134 kB
        11. screenshot-7.png
          screenshot-7.png
          39 kB
        12. screenshot-8.png
          screenshot-8.png
          46 kB
        13. screenshot-9.png
          screenshot-9.png
          42 kB

        Issue Links

          Activity

            People

              vhire Vaishnavi Hire
              kpouget2 Kevin Pouget
              Kevin Pouget Kevin Pouget
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: