-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.20
-
None
Description of problem:
The hybrid-overlay-node service on Windows nodes is experiencing certificate rotation issues where it attempted to connect to localhost:8443 instead of the actual API server endpoint during certificate rotation.
After investigation the host is avaiable in the provided kubeconfi, hybrid-overlay-node fails to update the default value (localhost:8443) when --bootstrap-kubeconfig flag is used
The ovn-kubernetes/go-controller/pkg/config/config.go defaults to const DefaultAPIServer = "http://localhost:8443" when no explicit API server is provided. The bootstrap kubeconfig contains the correct API server URL, but the certificate manager initialization doesn't properly extract it from the bootstrap kubeconfig.
hybrid-overlay-node command:
C:\k\hybrid-overlay-node.exe --node winworker-lcjzq --bootstrap-kubeconfig=C:\k\kubeconfig --cert-dir=C:\k\tls\certs --cert-duration=10m --windows-service --logfile C:\var\log\hybrid-overlay\hybrid-overlay.log --hybrid-overlay-vxlan-port 4789 --loglevel 5
Note: the --k8s-apiserver is getting ignored if --bootstrap-kubeconfig provided
Effective Kubernetes config (See APIServer with default values)
I1120 21:20:47.329011 3776 config.go:2474] Kubernetes config: {BootstrapKubeconfig:C:\k\kubeconfig CertDir:C:\k\cni\config CertDuration:15m0s Kubeconfig: CACert: CAData:[] APIServer:http://localhost:8443 Token: TokenFile: CompatServiceCIDR: RawServiceCIDRs:172.16.1.0/24 ServiceCIDRs:[172.16.1.0/24] OVNConfigNamespace:ovn-kubernetes OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes: NoHostSubnetNodes:<nil> HostNetworkNamespace: DisableRequestedChassis:false PlatformType: HealthzBindAddress: CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:kube-system DNSServiceName:kube-dns}
Version-Release number of selected component (if applicable):
OpenShift: 4.20.2 with OVN-Kubernetes hybrid-overlay
WMCO: 10.20.0-e3c3dfe
Windows: Windows Server 2022
How reproducible:
Always - occurs on every Windows node when hybrid-overlay-node service triggers the cert rotation
E1120 21:33:38.171751 3776 certificate_manager.go:596] "Failed while requesting a signed certificate from the control plane" err="cannot create certificate signing request: Post \"http://localhost:8443/apis/certificates.k8s.io/v1/certificatesigningrequests\": dial tcp 127.0.0.1:8443: connectex: No connection could be made because the target machine actively refused it." logger="kubernetes.io/kube-apiserver-client.UnhandledError"
Steps to Reproduce
- Deploy a Windows node to an OpenShift 4.20+ cluster using WMCO 10.20.0, its configures
hybrid-overlay-node service with --bootstrap-kubeconfig and --cert-dir (see above) - Check hybrid-overlay logs in C:\var\log\hybrid-overlay\hybrid-overlay.log
- Observe in Kubernetes config log line: APIServer:http://localhost:8443
- Wait for certificate rotation (24h or configured duration)
- Observe rotation failure with error: dial tcp 127.0.0.1:8443: connectex: No connection could be made because the target machine actively refused it.