Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-65856

hybrid-overlay-node fails to Certificate Rotation due to invalid APIServer when BootstrapKubeconfig option is provided

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.20
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The hybrid-overlay-node service on Windows nodes is experiencing certificate rotation issues where it attempted to connect to localhost:8443 instead of the actual API server endpoint during certificate rotation.

      After investigation the host is avaiable in the provided kubeconfi, hybrid-overlay-node fails to update the default value (localhost:8443) when --bootstrap-kubeconfig flag is used

      The ovn-kubernetes/go-controller/pkg/config/config.go defaults to const DefaultAPIServer = "http://localhost:8443" when no explicit API server is provided. The bootstrap kubeconfig contains the correct API server URL, but the certificate manager initialization doesn't properly extract it from the bootstrap kubeconfig.

      hybrid-overlay-node command:

      C:\k\hybrid-overlay-node.exe --node winworker-lcjzq --bootstrap-kubeconfig=C:\k\kubeconfig  --cert-dir=C:\k\tls\certs --cert-duration=10m --windows-service --logfile C:\var\log\hybrid-overlay\hybrid-overlay.log --hybrid-overlay-vxlan-port 4789 --loglevel 5
      

      Note: the --k8s-apiserver is getting ignored if --bootstrap-kubeconfig provided

      Effective Kubernetes config (See APIServer with default values)

      I1120 21:20:47.329011    3776 config.go:2474] Kubernetes config: {BootstrapKubeconfig:C:\k\kubeconfig CertDir:C:\k\cni\config CertDuration:15m0s Kubeconfig: CACert: CAData:[] APIServer:http://localhost:8443 Token: TokenFile: CompatServiceCIDR: RawServiceCIDRs:172.16.1.0/24 ServiceCIDRs:[172.16.1.0/24] OVNConfigNamespace:ovn-kubernetes OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes: NoHostSubnetNodes:<nil> HostNetworkNamespace: DisableRequestedChassis:false PlatformType: HealthzBindAddress: CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:kube-system DNSServiceName:kube-dns}
      

      Version-Release number of selected component (if applicable):
      OpenShift: 4.20.2 with OVN-Kubernetes hybrid-overlay
      WMCO: 10.20.0-e3c3dfe
      Windows: Windows Server 2022

      How reproducible:
      Always - occurs on every Windows node when hybrid-overlay-node service triggers the cert rotation

      E1120 21:33:38.171751    3776 certificate_manager.go:596] "Failed while requesting a signed certificate from the control plane" err="cannot create certificate signing request: Post \"http://localhost:8443/apis/certificates.k8s.io/v1/certificatesigningrequests\": dial tcp 127.0.0.1:8443: connectex: No connection could be made because the target machine actively refused it." logger="kubernetes.io/kube-apiserver-client.UnhandledError"
      

       

      Steps to Reproduce

      1. Deploy a Windows node to an OpenShift 4.20+ cluster using WMCO 10.20.0, its configures
        hybrid-overlay-node service with --bootstrap-kubeconfig and --cert-dir (see above)
      2. Check hybrid-overlay logs in C:\var\log\hybrid-overlay\hybrid-overlay.log
      3. Observe in Kubernetes config log line: APIServer:http://localhost:8443
      4. Wait for certificate rotation (24h or configured duration)
      5. Observe rotation failure with error: dial tcp 127.0.0.1:8443: connectex: No connection could be made because the target machine actively refused it.

       

       

              tpanteli Thomas Pantelis
              jvaldes@redhat.com Jose Valdes
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: