-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.10.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
No
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When customer tries an SNO installation the api server fails to reach the ETCD because it tries to connect to the wrong address. The installation eventually fails too.
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. Try to install SNO on a node with 2 IP address 2. 3.
Actual results:
API server excerpt:
$ oc logs -n openshift-apiserver apiserver-559d986669-ps2sx openshift-apiserver
2023-03-31T12:32:56.028893956Z Copying system trust bundle
2023-03-31T12:32:56.087920219Z I0331 12:32:56.087824 1 dynamic_serving_content.go:112] "Loaded a new cert/key pair" name="serving-cert::/var/run/secrets/serving-cert/tls.crt::/var/run/secrets/serving-cert/tls.key"
2023-03-31T12:32:56.451517004Z I0331 12:32:56.451485 1 requestheader_controller.go:244] Loaded a new request header values for RequestHeaderAuthRequestController
2023-03-31T12:32:56.452644985Z I0331 12:32:56.452624 1 audit.go:353] Using audit backend: ignoreErrors<log>
2023-03-31T12:32:56.453520829Z I0331 12:32:56.453491 1 plugins.go:84] "Registered admission plugin" plugin="NamespaceLifecycle"
2023-03-31T12:32:56.453529906Z I0331 12:32:56.453523 1 plugins.go:84] "Registered admission plugin" plugin="ValidatingAdmissionWebhook"
2023-03-31T12:32:56.453536253Z I0331 12:32:56.453530 1 plugins.go:84] "Registered admission plugin" plugin="MutatingAdmissionWebhook"
2023-03-31T12:32:56.453902013Z I0331 12:32:56.453890 1 admission.go:48] Admission plugin "project.openshift.io/ProjectRequestLimit" is not configured so it will be disabled.
2023-03-31T12:32:56.454432959Z I0331 12:32:56.454421 1 plugins.go:158] Loaded 5 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,build.openshift.io/BuildConfigSecretInjector,image.openshift.io/ImageLimitRange,image.openshift.io/ImagePolicy,MutatingAdmissionWebhook.
2023-03-31T12:32:56.454439216Z I0331 12:32:56.454431 1 plugins.go:161] Loaded 9 validating admission controller(s) successfully in the following order: OwnerReferencesPermissionEnforcement,build.openshift.io/BuildConfigSecretInjector,build.openshift.io/BuildByStrategy,image.openshift.io/ImageLimitRange,image.openshift.io/ImagePolicy,quota.openshift.io/ClusterResourceQuota,route.openshift.io/RequiredRouteAnnotations,ValidatingAdmissionWebhook,ResourceQuota.
2023-03-31T12:32:56.460960855Z W0331 12:32:56.460948 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting...
2023-03-31T12:32:57.465246823Z W0331 12:32:57.465139 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting...
2023-03-31T12:32:58.783470542Z W0331 12:32:58.783435 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting...
2023-03-31T12:33:01.325191617Z W0331 12:33:01.325091 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting...
2023-03-31T12:33:05.609634050Z W0331 12:33:05.609512 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting...
2023-03-31T12:33:12.663589220Z W0331 12:33:12.663484 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {16.1.15.2:2379 16.1.15.2 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for ::1, 10.240.92.13, 127.0.0.1, ::1, not 16.1.15.2". Reconnecting...
2023-03-31T12:33:16.457605396Z F0331 12:33:16.457507 1 openshift_apiserver.go:379] context deadline exceeded
Node IPs:
$ oc get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
vdu Ready master,worker 14h v1.23.12+8a6bfe4 10.240.92.13 <none> Red Hat Enterprise Linux CoreOS 410.84.202212061416-0 (Ootpa) 4.18.0-305.72.1.el8_4.x86_64 cri-o://1.23.3-20.rhaos4.10.git89344de.el8
$ oc get node -o wide -o yaml | grep host-addresses
k8s.ovn.org/host-addresses: '["10.240.92.13","16.1.15.2"]'
KubeAPIServer:
$ oc get KubeAPIServer cluster -o yaml | grep -i etcd-servers -A2
etcd-servers:
- https://16.1.15.2:2379
- https://localhost:2379
ETCD:
$ oc get po -n openshift-etcd -o wide
NAME READY STATUS RESTARTS AGE IP NODE
etcd-vdu 4/5 Running 166 14h 10.240.92.13 vdu
installer-2-vdu 0/1 Completed 0 14h 10.128.0.39 vdu
installer-3-vdu 0/1 Completed 0 14h 10.128.0.51 vdu
$ oc get ep -n openshift-etcd etcd -o yaml | yq .subsets
[
{
"notReadyAddresses": [
{
"ip": "10.240.92.13",
"nodeName": "vdu",
"targetRef": {
"kind": "Pod",
"name": "etcd-vdu",
"namespace": "openshift-etcd",
"resourceVersion": "581549",
"uid": "a63c5723-6b94-4b15-90fd-bbcfcdf88e81"
}
}
],
"ports": [
{
"name": "etcd",
"port": 2379,
"protocol": "TCP"
},
{
"name": "etcd-metrics",
"port": 9979,
"protocol": "TCP"
}
]
}
]
Expected results:
ETCD should be contacted on its pod IP address, and the installation should succeed.
Additional info: