Uploaded image for project: 'Red Hat OpenShift AI Engineering'
  1. Red Hat OpenShift AI Engineering
  2. RHOAIENG-6392

Code Server workbench doesn't start in a disconnected cluster

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • RHOAI_2.8.1_GA, RHOAI_2.9.0_GA
    • IDE, Workbenches
    • False
    • Hide

      None

      Show
      None
    • False
    • No
    • No
    • RHOAI IDE - Boston
    • Testable

      Trying the RHOAI 2.9 RC1 build in disconnected environment and I'm impossible to start the code server (image: code-server 2024.1) successfully there.

      The workbench starts creating but never switches from Starting to Running state in the RHOAI dashboard.

      In OpenShift console, I can see that the workbench pod is created eventually for a very brief amount of time. Although, it fails quite soon because of the probe readiness. And this happens repeatedly. Here is the readiness event failure:

      Generated from kubelet on worker-rhcos-1.dis-29-rc1.osp.rh-ods.com
      56 times in the last 14 minutes
      Readiness probe failed: HTTP probe failed with statuscode: 502
      

      Readiness failure then leads towards the container restart:

      Generated from kubelet on worker-rhcos-1.dis-29-rc1.osp.rh-ods.com
      40 times in the last 9 minutes
      Back-off restarting failed container js in pod js-0_jstourac(93a7ab0b-a909-4953-bcff-32bd099cb327)
      

      This is the output of the created pod:

      js container log
      # For more information on configuration, see:
      #   * Official English Documentation: http://nginx.org/en/docs/
      #   * Official Russian Documentation: http://nginx.org/ru/docs/
      
      
      worker_processes 1;
      error_log /var/log/nginx/error.log;
      pid /run/nginx.pid;
      
      # Load dynamic modules. See /usr/share/doc/nginx/README.dynamic.
      include /usr/share/nginx/modules/*.conf;
      
      events {
          worker_connections 1024;
      }
      
      http {
          log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                            '$status $body_bytes_sent "$http_referer" '
                            '"$http_user_agent" "$http_x_forwarded_for"';
      
          access_log  /var/log/nginx/access.log  main;
      
          sendfile            on;
          tcp_nopush          on;
          tcp_nodelay         on;
          keepalive_timeout   65;
          types_hash_max_size 4096;
      
          include             /etc/nginx/mime.types;
          default_type        application/octet-stream;
      
          # Load modular configuration files from the /etc/nginx/conf.d directory.
          # See http://nginx.org/en/docs/ngx_core_module.html#include
          # for more information.
          include /opt/app-root/etc/nginx.d/*.conf;
      
          server {
              listen       8888 default_server;
              server_name  js-jstourac.apps.dis-29-rc1.osp.rh-ods.com;
              root         /opt/app-root/src;
      
              # Load configuration files for the default server block.
              include /opt/app-root/etc/nginx.default.d/*.conf;
      
              location = /404.html {
              }
      
          }
      
      # Settings for a TLS enabled server.
      #
      #    server {
      #        listen       443 ssl http2;
      #        server_name  js-jstourac.apps.dis-29-rc1.osp.rh-ods.com;
      #        root         /opt/app-root/src;
      #
      #        ssl_certificate "/etc/pki/nginx/server.crt";
      #        ssl_certificate_key "/etc/pki/nginx/private/server.key";
      #        ssl_session_cache shared:SSL:1m;
      #        ssl_session_timeout  10m;
      #        ssl_ciphers PROFILE=SYSTEM;
      #        ssl_prefer_server_ciphers on;
      #
      #        # Load configuration files for the default server block.
      #        include /opt/app-root/etc/nginx.default.d/*.conf;
      #
      #
      #    }
      
      }
      
      spawn-fcgi: child spawned successfully: PID: 10
      Debug: User directory already exists.
      Debug: '/opt/app-root/src/.local/share/code-server/User/settings.json' file already exists.
      2024/04/22 19:22:57 [error] 29#29: *3 connect() failed (111: Connection refused) while connecting to upstream, client: 10.129.2.2, server: js-jstourac.apps.dis-29-rc1.osp.rh-ods.com, request: "GET /codeserver/healthz/ HTTP/1.1", upstream: "http://127.0.0.1:8787/healthz/", host: "10.129.2.123:8888", referrer: "http://10.129.2.123:8888/notebook/jstourac/js/api"
      2024/04/22 19:22:57 [error] 29#29: *5 connect() failed (111: Connection refused) while connecting to upstream, client: 10.129.2.2, server: js-jstourac.apps.dis-29-rc1.osp.rh-ods.com, request: "GET /codeserver/healthz/ HTTP/1.1", upstream: "http://127.0.0.1:8787/healthz/", host: "10.129.2.123:8888", referrer: "http://10.129.2.123:8888/notebook/jstourac/js/api"
      2024/04/22 19:23:02 [error] 29#29: *9 connect() failed (111: Connection refused) while connecting to upstream, client: 10.129.2.2, server: js-jstourac.apps.dis-29-rc1.osp.rh-ods.com, request: "GET /codeserver/healthz/ HTTP/1.1", upstream: "http://127.0.0.1:8787/healthz/", host: "10.129.2.123:8888", referrer: "http://10.129.2.123:8888/notebook/jstourac/js/api"
      2024/04/22 19:23:02 [error] 29#29: *11 connect() failed (111: Connection refused) while connecting to upstream, client: 10.129.2.2, server: js-jstourac.apps.dis-29-rc1.osp.rh-ods.com, request: "GET /codeserver/healthz/ HTTP/1.1", upstream: "http://127.0.0.1:8787/healthz/", host: "10.129.2.123:8888", referrer: "http://10.129.2.123:8888/notebook/jstourac/js/api"
      2024/04/22 19:23:07 [error] 29#29: *15 connect() failed (111: Connection refused) while connecting to upstream, client: 10.129.2.2, server: js-jstourac.apps.dis-29-rc1.osp.rh-ods.com, request: "GET /codeserver/healthz/ HTTP/1.1", upstream: "http://127.0.0.1:8787/healthz/", host: "10.129.2.123:8888", referrer: "http://10.129.2.123:8888/notebook/jstourac/js/api"
      2024/04/22 19:23:07 [error] 29#29: *16 connect() failed (111: Connection refused) while connecting to upstream, client: 10.129.2.2, server: js-jstourac.apps.dis-29-rc1.osp.rh-ods.com, request: "GET /codeserver/healthz/ HTTP/1.1", upstream: "http://127.0.0.1:8787/healthz/", host: "10.129.2.123:8888", referrer: "http://10.129.2.123:8888/notebook/jstourac/js/api"
      2024/04/22 19:23:07 [error] 29#29: *20 connect() failed (111: Connection refused) while connecting to upstream, client: 10.129.2.2, server: js-jstourac.apps.dis-29-rc1.osp.rh-ods.com, request: "GET /codeserver/healthz/ HTTP/1.1", upstream: "http://127.0.0.1:8787/healthz/", host: "10.129.2.123:8888", referrer: "http://10.129.2.123:8888/notebook/jstourac/js/api"
      2024/04/22 19:23:12 [error] 29#29: *23 connect() failed (111: Connection refused) while connecting to upstream, client: 10.129.2.2, server: js-jstourac.apps.dis-29-rc1.osp.rh-ods.com, request: "GET /codeserver/healthz/ HTTP/1.1", upstream: "http://127.0.0.1:8787/healthz/", host: "10.129.2.123:8888", referrer: "http://10.129.2.123:8888/notebook/jstourac/js/api"
      2024/04/22 19:23:17 [error] 29#29: *26 connect() failed (111: Connection refused) while connecting to upstream, client: 10.129.2.2, server: js-jstourac.apps.dis-29-rc1.osp.rh-ods.com, request: "GET /codeserver/healthz/ HTTP/1.1", upstream: "http://127.0.0.1:8787/healthz/", host: "10.129.2.123:8888", referrer: "http://10.129.2.123:8888/notebook/jstourac/js/api"
      
      oauth-proxy container log
      2024/04/22 19:20:00 provider.go:120: Defaulting client-id to system:serviceaccount:jstourac:js
      2024/04/22 19:20:00 provider.go:125: Defaulting client-secret to service account token /var/run/secrets/kubernetes.io/serviceaccount/token
      2024/04/22 19:20:00 oauthproxy.go:203: mapping path "/" => upstream "http://localhost:8888/"
      2024/04/22 19:20:00 oauthproxy.go:230: OAuthProxy configured for Client ID: system:serviceaccount:jstourac:js
      2024/04/22 19:20:00 oauthproxy.go:240: Cookie settings: name:_oauth_proxy secure(https):true httponly:true expiry:24h0m0s domain:<default> samesite: refresh:disabled
      I0422 19:20:00.131824 1 dynamic_serving_content.go:130] Starting serving::/etc/tls/private/tls.crt::/etc/tls/private/tls.key
      2024/04/22 19:20:00 http.go:107: HTTPS: listening on [::]:8443
      

      Not sure whether we have some misconfiguration in our disconnected infra.

      The logs are taken from the RHOAI 2.9.0RC1 build. But similar behavior can be seen also with the already released RHOAI 2.8.1.

            rh-ee-atheodor Adriana Theodorakopoulou
            jstourac@redhat.com Jan Stourac
            RHOAI IDE
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: