-
Bug
-
Resolution: Done-Errata
-
Undefined
-
4.14
-
None
-
Moderate
-
No
-
False
-
Description of problem:
The advertise address configured for our hcp etcd clusters is not resolvable via DNS (ie. etcd-0.etcd-client.namespace.svc:2379). This impacts some of the etcd tooling that expects to access each member by their advertise address.
Version-Release number of selected component (if applicable):
4.14 (and earlier)
How reproducible:
Always
Steps to Reproduce:
1. Create a HostedCluster and wait for it to come up. 2. Exec into an etcd pod and query cluster endpoint health: $ oc rsh etcd-0 $ etcdctl --cacert /etc/etcd/tls/etcd-ca/ca.crt \ --cert /etc/etcd/tls/server/server.crt \ --key /etc/etcd/tls/server/server.key \ --endpoints https://localhost:2379 \ endpoint health --cluster -w table
Actual results:
An error is returned similar to: {"level":"warn","ts":"2023-08-07T20:40:49.890254Z","logger":"client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000378fc0/etcd-0.etcd-client.clusters-test-cluster.svc:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup etcd-0.etcd-client.clusters-test-cluster.svc on 172.30.0.10:53: no such host\""}
Expected results:
Actual cluster health is returned: +--------------------------------------------------------------+--------+-------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +--------------------------------------------------------------+--------+-------------+-------+ | https://etcd-0.etcd-discovery.clusters-cewong-guest.svc:2379 | true | 9.372168ms | | | https://etcd-2.etcd-discovery.clusters-cewong-guest.svc:2379 | true | 12.269226ms | | | https://etcd-1.etcd-discovery.clusters-cewong-guest.svc:2379 | true | 12.291392ms | | +--------------------------------------------------------------+--------+-------------+-------+
Additional info:
The etcd statefulset is created with spec.serviceName set to `etcd-discovery`. This means that pods in the statefulset, get subdomain set to `etcd-discovery` and names like etcd-0.etcd-discovery.[ns].svc are resolvable. However, the same is not true for the etcd-client service. etcd-0.etcd-client.[ns].svc is not resolvable. The fix would be to change the advertise address of each member to a resolvable name (ie. etcd-0.etcd-discvoery.[ns].svc) and adjust the server certificate to allow those names as well.