Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-66064

monitoring-plugin pods fail TLS handshake after applying "modern" tlsSecurityProfile

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.20.z
    • Monitoring
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • contract-priority
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      [ENV]

      OCP 4.20.3 compact cluster

       

      [The problem]
      customer enabled on OCP 4.20.3 "modern" [0] TLS security profile ( requires a minimum TLS version of 1.3 ) for the control plane (ApiServer) as described in our docs [1]

       

        tlsSecurityProfile:
          modern: {}
          type: Modern

      Now monitoring-plugin pods are not able to come up anymore failing the TLS handshake:

       

       

      $ oc logs monitoring-plugin-df7fb4f7f-66f6q
      
      time="2025-11-24T15:21:16Z" level=info msg="enabled features: []\n" module=main
      time="2025-11-24T15:21:16Z" level=warning msg="cannot read config file, serving plugin with default configuration, tried /etc/plugin/config.yaml" error="open /etc/plugin/config.yaml: no such file or directory" module=server
      time="2025-11-24T15:21:16Z" level=info msg="listening for https on :9443" module=server
      I1124 15:21:16.074372       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
      I1124 15:21:16.074815       1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/var/cert/tls.crt::/var/cert/tls.key"
      time="2025-11-24T15:21:16Z" level=info msg="Event(v1.ObjectReference{Kind:\"\", Namespace:\"\", Name:\"serving-cert::/var/cert/tls.crt::/var/cert/tls.key\", UID:\"\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Warning' reason: 'TLSConfigChanged' loaded serving cert [\"serving-cert::/var/cert/tls.crt::/var/cert/tls.key\"]: \"monitoring-plugin.openshift-monitoring.svc\" [serving] validServingFor=[monitoring-plugin.openshift-monitoring.svc,monitoring-plugin.openshift-monitoring.svc.cluster.local] issuer=\"openshift-service-serving-signer@1763976566\" (2025-11-24 09:41:58 +0000 UTC to 2027-11-24 09:41:59 +0000 UTC (now=2025-11-24 15:21:16.074907605 +0000 UTC))" module=server
      2025/11/24 15:21:16 http: TLS handshake error from 10.132.0.2:58758: tls: client offered only unsupported versions: [304 303]
      2025/11/24 15:21:17 http: TLS handshake error from 10.132.0.2:58774: tls: client offered only unsupported versions: [304 303]
      [root@dell-r430-13 ~]# oc logs monitoring-plugin-df7fb4f7f-66f6q
      time="2025-11-24T15:21:16Z" level=info msg="enabled features: []\n" module=main
      time="2025-11-24T15:21:16Z" level=warning msg="cannot read config file, serving plugin with default configuration, tried /etc/plugin/config.yaml" error="open /etc/plugin/config.yaml: no such file or directory" module=server
      time="2025-11-24T15:21:16Z" level=info msg="listening for https on :9443" module=server
      I1124 15:21:16.074372       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
      I1124 15:21:16.074815       1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/var/cert/tls.crt::/var/cert/tls.key"
      time="2025-11-24T15:21:16Z" level=info msg="Event(v1.ObjectReference{Kind:\"\", Namespace:\"\", Name:\"serving-cert::/var/cert/tls.crt::/var/cert/tls.key\", UID:\"\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Warning' reason: 'TLSConfigChanged' loaded serving cert [\"serving-cert::/var/cert/tls.crt::/var/cert/tls.key\"]: \"monitoring-plugin.openshift-monitoring.svc\" [serving] validServingFor=[monitoring-plugin.openshift-monitoring.svc,monitoring-plugin.openshift-monitoring.svc.cluster.local] issuer=\"openshift-service-serving-signer@1763976566\" (2025-11-24 09:41:58 +0000 UTC to 2027-11-24 09:41:59 +0000 UTC (now=2025-11-24 15:21:16.074907605 +0000 UTC))" module=server
      2025/11/24 15:21:16 http: TLS handshake error from 10.132.0.2:58758: tls: client offered only unsupported versions: [304 303]
      2025/11/24 15:21:17 http: TLS handshake error from 10.132.0.2:58774: tls: client offered only unsupported versions: [304 303]
      2025/11/24 15:21:27 http: TLS handshake error from 10.132.0.2:57884: tls: client offered only unsupported versions: [304 303]
      2025/11/24 15:21:37 http: TLS handshake error from 10.132.0.2:36458: tls: client offered only unsupported versions: [304 303]
      2025/11/24 15:21:47 http: TLS handshake error from 10.132.0.2:39778: tls: client offered only unsupported versions: [304 303]
      2025/11/24 15:21:57 http: TLS handshake error from 10.132.0.2:47528: tls: client offered only unsupported versions: [304 303]
      2025/11/24 15:22:07 http: TLS handshake error from 10.132.0.2:47892: tls: client offered only unsupported versions: [304 303]
      2025/11/24 15:22:17 http: TLS handshake error from 10.132.0.2:52802: tls: client offered only unsupported versions: [304 303]
      2025/11/24 15:22:27 http: TLS handshake error from 10.132.0.2:55796: tls: client offered only unsupported versions: [304 303]
      

      Based on OCP monitoring docs [2]:

      > The monitoring stack component uses the TLS security profile settings that already exist in the tlsSecurityProfile field in the global OpenShift Container Platform apiservers.config.openshift.io/cluster resource.

      So my understanding is that no other actions or config changes are expected in order to make all the monitoring components being aligned and working fine with the tlsSecurityProfile defined at ApiServer level.

       

      Anyway, the reporting component for the probe error seems to be Kubelet

       "kind": "Event",
        "lastTimestamp": "2025-11-21T09:19:35Z",
        "message": "Readiness probe error: Get \"https://[fd02:0:0:1::de]:9443/health\": remote error: tls: protocol version not supported\nbody: \n",
      ...
        "reason": "ProbeError",
        "reportingComponent": "kubelet",
        "reportingInstance": "master-2",
        "source": {
          "component": "kubelet",
          "host": "master-2"

      so we also changed TLS profile for the Kubelet as well [3], but unfortunately the problem is still there.

       

      [Additional Info]

      I was able to reproduce exactly the same on a 4.20.3 lab 100% of times.

      [0] https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/security_and_compliance/tls-security-profiles#tls-profiles-understanding_tls-security-profiles

      [1] https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/security_and_compliance/tls-security-profiles#tls-profiles-kubernetes-configuring_tls-security-profiles

      [2] https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/monitoring/about-openshift-container-platform-monitoring#tls-security-and-rotation_monitoring-stack-architecture

       [3]

      https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/security_and_compliance/tls-security-profiles#tls-profiles-kubelet-configuring_tls-security-profiles

              spasquie@redhat.com Simon Pasquier
              rh-ee-fpiccion Flavio Piccioni
              None
              None
              Junqi Zhao Junqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: