-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.10.z
-
None
-
Moderate
-
No
-
False
-
Description of problem:
When customer tries an SNO installation the api server fails to reach the ETCD because it tries to connect to the wrong address. The installation eventually fails too.
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. Try to install SNO on a node with 2 IP address 2. 3.
Actual results:
API server excerpt: $ oc logs -n openshift-apiserver apiserver-559d986669-ps2sx openshift-apiserver 2023-03-31T12:32:56.028893956Z Copying system trust bundle 2023-03-31T12:32:56.087920219Z I0331 12:32:56.087824 1 dynamic_serving_content.go:112] "Loaded a new cert/key pair" name="serving-cert::/var/run/secrets/serving-cert/tls.crt::/var/run/secrets/serving-cert/tls.key" 2023-03-31T12:32:56.451517004Z I0331 12:32:56.451485 1 requestheader_controller.go:244] Loaded a new request header values for RequestHeaderAuthRequestController 2023-03-31T12:32:56.452644985Z I0331 12:32:56.452624 1 audit.go:353] Using audit backend: ignoreErrors<log> 2023-03-31T12:32:56.453520829Z I0331 12:32:56.453491 1 plugins.go:84] "Registered admission plugin" plugin="NamespaceLifecycle" 2023-03-31T12:32:56.453529906Z I0331 12:32:56.453523 1 plugins.go:84] "Registered admission plugin" plugin="ValidatingAdmissionWebhook" 2023-03-31T12:32:56.453536253Z I0331 12:32:56.453530 1 plugins.go:84] "Registered admission plugin" plugin="MutatingAdmissionWebhook" 2023-03-31T12:32:56.453902013Z I0331 12:32:56.453890 1 admission.go:48] Admission plugin "project.openshift.io/ProjectRequestLimit" is not configured so it will be disabled. 2023-03-31T12:32:56.454432959Z I0331 12:32:56.454421 1 plugins.go:158] Loaded 5 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,build.openshift.io/BuildConfigSecretInjector,image.openshift.io/ImageLimitRange,image.openshift.io/ImagePolicy,MutatingAdmissionWebhook. 2023-03-31T12:32:56.454439216Z I0331 12:32:56.454431 1 plugins.go:161] Loaded 9 validating admission controller(s) successfully in the following order: OwnerReferencesPermissionEnforcement,build.openshift.io/BuildConfigSecretInjector,build.openshift.io/BuildByStrategy,image.openshift.io/ImageLimitRange,image.openshift.io/ImagePolicy,quota.openshift.io/ClusterResourceQuota,route.openshift.io/RequiredRouteAnnotations,ValidatingAdmissionWebhook,ResourceQuota. 2023-03-31T12:32:56.460960855Z W0331 12:32:56.460948 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting... 2023-03-31T12:32:57.465246823Z W0331 12:32:57.465139 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting... 2023-03-31T12:32:58.783470542Z W0331 12:32:58.783435 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting... 2023-03-31T12:33:01.325191617Z W0331 12:33:01.325091 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting... 2023-03-31T12:33:05.609634050Z W0331 12:33:05.609512 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting... 2023-03-31T12:33:12.663589220Z W0331 12:33:12.663484 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting... 2023-03-31T12:33:16.457605396Z F0331 12:33:16.457507 1 openshift_apiserver.go:379] context deadline exceeded Node IPs: $ oc get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME vdu Ready master,worker 14h v1.23.12+8a6bfe4 10.240.92.13 <none> Red Hat Enterprise Linux CoreOS 410.84.202212061416-0 (Ootpa) 4.18.0-305.72.1.el8_4.x86_64 cri-o://1.23.3-20.rhaos4.10.git89344de.el8 $ oc get node -o wide -o yaml | grep host-addresses k8s.ovn.org/host-addresses: '["10.240.92.13","16.1.15.2"]' KubeAPIServer: $ oc get KubeAPIServer cluster -o yaml | grep -i etcd-servers -A2 etcd-servers: - https://16.1.15.2:2379 - https://localhost:2379 ETCD: $ oc get po -n openshift-etcd -o wide NAME READY STATUS RESTARTS AGE IP NODE etcd-vdu 4/5 Running 166 14h 10.240.92.13 vdu installer-2-vdu 0/1 Completed 0 14h 10.128.0.39 vdu installer-3-vdu 0/1 Completed 0 14h 10.128.0.51 vdu $ oc get ep -n openshift-etcd etcd -o yaml | yq .subsets [ { "notReadyAddresses": [ { "ip": "10.240.92.13", "nodeName": "vdu", "targetRef": { "kind": "Pod", "name": "etcd-vdu", "namespace": "openshift-etcd", "resourceVersion": "581549", "uid": "a63c5723-6b94-4b15-90fd-bbcfcdf88e81" } } ], "ports": [ { "name": "etcd", "port": 2379, "protocol": "TCP" }, { "name": "etcd-metrics", "port": 9979, "protocol": "TCP" } ] } ]
Expected results:
ETCD should be contacted on its pod IP address, and the installation should succeed.
Additional info: