-
Bug
-
Resolution: Done-Errata
-
Major
-
4.14.z, 4.15.0, 4.16.0
Description of problem:
ovnkube-node doesn't issue a CSR to get new certificates when node is suspended for 30 days
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Setup a libvirt cluster on machine 2. Disable chronyd on all nodes and host machine 3. Suspend nodes 4. Change time on host 30 days forward 5. Resume nodes 6. Wait for API server to come up 7. Wait for all operators to become ready
Actual results:
ovnkube-node would attempt to use expired certs: 2024-01-21T01:24:41.576365431+00:00 stderr F I0121 01:24:41.573615 8852 master.go:740] Adding or Updating Node "test-infra-cluster-4832ebf8-master-0" 2024-04-20T01:25:08.519622252+00:00 stderr F I0420 01:25:08.516550 8852 services_controller.go:567] Deleting service openshift-operator-lifecycle-manager/packageserver-service 2024-04-20T01:25:08.900228370+00:00 stderr F I0420 01:25:08.898580 8852 services_controller.go:567] Deleting service openshift-operator-lifecycle-manager/packageserver-service 2024-04-20T01:25:17.137956433+00:00 stderr F I0420 01:25:17.137891 8852 obj_retry.go:296] Retry object setup: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp 2024-04-20T01:25:17.137956433+00:00 stderr F I0420 01:25:17.137933 8852 obj_retry.go:358] Adding new object: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp 2024-04-20T01:25:17.137997952+00:00 stderr F I0420 01:25:17.137979 8852 obj_retry.go:370] Retry add failed for *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp, will try again later: failed to obtain IPs to add remote pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: suppressed error logged: pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: no pod IPs found 2024-04-20T01:25:19.099635059+00:00 stderr F I0420 01:25:19.099057 8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-master-1 2024-04-20T01:25:19.099635059+00:00 stderr F I0420 01:25:19.099080 8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-master-1: 35.077µs 2024-04-20T01:25:22.245550966+00:00 stderr F W0420 01:25:22.242774 8852 base_network_controller_namespace.go:458] Unable to remove remote zone pod's openshift-controller-manager/controller-manager-5485d88c84-xztxq IP address from the namespace address-set, err: pod openshift-controller-manager/controller-manager-5485d88c84-xztxq: no pod IPs found 2024-04-20T01:25:22.262446336+00:00 stderr F W0420 01:25:22.261351 8852 base_network_controller_namespace.go:458] Unable to remove remote zone pod's openshift-route-controller-manager/route-controller-manager-6b5868f887-n6jj9 IP address from the namespace address-set, err: pod openshift-route-controller-manager/route-controller-manager-6b5868f887-n6jj9: no pod IPs found 2024-04-20T01:25:27.154790226+00:00 stderr F I0420 01:25:27.154744 8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-worker-0 2024-04-20T01:25:27.154790226+00:00 stderr F I0420 01:25:27.154770 8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-worker-0: 31.72µs 2024-04-20T01:25:27.172301639+00:00 stderr F I0420 01:25:27.168666 8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-master-2 2024-04-20T01:25:27.172301639+00:00 stderr F I0420 01:25:27.168692 8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-master-2: 34.346µs 2024-04-20T01:25:27.196078022+00:00 stderr F I0420 01:25:27.194311 8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-master-0 2024-04-20T01:25:27.196078022+00:00 stderr F I0420 01:25:27.194339 8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-master-0: 40.027µs 2024-04-20T01:25:27.196078022+00:00 stderr F I0420 01:25:27.194582 8852 master.go:740] Adding or Updating Node "test-infra-cluster-4832ebf8-master-0" 2024-04-20T01:25:27.215435944+00:00 stderr F I0420 01:25:27.215387 8852 master.go:740] Adding or Updating Node "test-infra-cluster-4832ebf8-master-0" 2024-04-20T01:25:35.789830706+00:00 stderr F I0420 01:25:35.789782 8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-worker-1 2024-04-20T01:25:35.790044794+00:00 stderr F I0420 01:25:35.790025 8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-worker-1: 250.227µs 2024-04-20T01:25:37.596875642+00:00 stderr F I0420 01:25:37.596834 8852 iptables.go:358] "Running" command="iptables-save" arguments=["-t","nat"] 2024-04-20T01:25:47.138312366+00:00 stderr F I0420 01:25:47.138266 8852 obj_retry.go:296] Retry object setup: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp 2024-04-20T01:25:47.138382299+00:00 stderr F I0420 01:25:47.138370 8852 obj_retry.go:358] Adding new object: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp 2024-04-20T01:25:47.138453866+00:00 stderr F I0420 01:25:47.138440 8852 obj_retry.go:370] Retry add failed for *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp, will try again later: failed to obtain IPs to add remote pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: suppressed error logged: pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: no pod IPs found 2024-04-20T01:26:17.138583468+00:00 stderr F I0420 01:26:17.138544 8852 obj_retry.go:296] Retry object setup: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp 2024-04-20T01:26:17.138640587+00:00 stderr F I0420 01:26:17.138629 8852 obj_retry.go:358] Adding new object: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp 2024-04-20T01:26:17.138708817+00:00 stderr F I0420 01:26:17.138696 8852 obj_retry.go:370] Retry add failed for *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp, will try again later: failed to obtain IPs to add remote pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: suppressed error logged: pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: no pod IPs found 2024-04-20T01:26:39.474787436+00:00 stderr F I0420 01:26:39.474744 8852 reflector.go:790] k8s.io/client-go/informers/factory.go:159: Watch close - *v1.EndpointSlice total 130 items received 2024-04-20T01:26:39.475670148+00:00 stderr F E0420 01:26:39.475653 8852 reflector.go:147] k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.EndpointSlice: the server has asked for the client to provide credentials (get endpointslices.discovery.k8s.io) 2024-04-20T01:26:40.786339334+00:00 stderr F I0420 01:26:40.786255 8852 reflector.go:325] Listing and watching *v1.EndpointSlice from k8s.io/client-go/informers/factory.go:159 2024-04-20T01:26:40.806238387+00:00 stderr F W0420 01:26:40.804542 8852 reflector.go:535] k8s.io/client-go/informers/factory.go:159: failed to list *v1.EndpointSlice: Unauthorized 2024-04-20T01:26:40.806238387+00:00 stderr F E0420 01:26:40.804571 8852 reflector.go:147] k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Unauthorized
Expected results:
ovnkube-node detects that cert is expired, requests new certs via CSR flow and reloads them
Additional info:
CI periodic to check this flow: https://prow.ci.openshift.org/job-history/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ovn-sno-cert-rotation-suspend-30d artifacts contain sosreport Applies to SNO and HA clusters, works as expected when nodes are being properly shutdown instead of suspended
- blocks
-
OCPBUGS-31081 ovnkube-node doesn't refresh certificates after node was suspended for 30 days
- Closed
- clones
-
OCPBUGS-28735 Multus doesn't refresh certificates after node was suspended for 30 days
- Closed
- is cloned by
-
OCPBUGS-31081 ovnkube-node doesn't refresh certificates after node was suspended for 30 days
- Closed
- is related to
-
SDN-4460 Investigate CA rotation for ovnkube-node certificates
- To Do
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update