-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.12
After Cert expires, ovnkube-master starts to log the below error as per our grpc loglevel this is actually a x509, I also confirmed a successful tcp conection was established and torndown in <100ms, https://github.com/grpc/grpc-go/issues/2561 egressip_healthcheck.go:162] Could not connect to $hostname ($ip:9107):context deadline exceeded
Version-Release number of selected component (if applicable):
How reproducible:
Rotate the openshift-ovn-kubernetes/ovn-cert and Wait for the old cert to expire
Steps to Reproduce:
1. wait %10 days after ovn-cert rotation, and with no pod restarts. 2. After cert rotation, 18days (%10 of validity) egressIPs will be removed from all nodes. 3. all nodes will start to fail egress health probes with context deadline exceeded
Actual results:
Silent failure of egressIP healthchecks
Expected results:
No noticable impact, automatic loading of new cert/restart of pod
Additional info:
whom ever rotated the cert should be restarting the daemonsets. and we should also log x509 issues in the grpc library.
- blocks
-
OCPBUGS-33619 EgressIP Healthcheck silently breaks 18 days after ovn-cert rotation
- Closed
- is cloned by
-
OCPBUGS-33619 EgressIP Healthcheck silently breaks 18 days after ovn-cert rotation
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update