Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-21574

KUBE_PING fails to authenticate and discover after Kubernetes service account token rotation

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • 40.0.0.Beta1
    • 39.0.0.Final, 39.0.1.Final
    • Clustering
    • None
    • Need to add automation
    • Missing test

      Since Kubernetes 1.21, service account tokens are bound and time-limited by default (KEP-1205).

      See https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/1205-bound-service-account-tokens/README.md

      The kubelet automatically rotates the token file at /var/run/secrets/kubernetes.io/serviceaccount/token when the token reaches 80% of its TTL (default TTL is 1 hour, so rotation happens after ~48 minutes).

      The current jgroups-kubernetes implementation reads the token file once at startup and caches it for the lifetime of the process. When the kubelet rotates the token, the cached stale token causes all subsequent KUBE_PING discovery requests to fail with HTTP 401 Unauthorized, breaking cluster discovery. This means new nodes cannot join the cluster and existing members cannot detect failures after token expiry.

      Steps to reproduce:
      1. Deploy a WildFly cluster on Kubernetes with KUBE_PING configured
      2. Keep the cluster running past the token rotation point (~48 minutes with default 1-hour TTL)
      3. Observe KUBE_PING discovery failures as the stale token is rejected

      Fix is to upgrade the jgroups-kubernetes dependency to a version containing the fix, where TokenStreamProvider caches the token with a 1-minute refresh interval (re-reading from the token file), and additionally forces a refresh on HTTP 401 responses so that retries within the same interval also succeed. Filing this to have proper visibility within WildFly project.

              rhn-engineering-rhusar Radoslav Husar
              rhn-engineering-rhusar Radoslav Husar
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: