1. Proposed title of this feature request.
Enhance etcd error logs when over an UPI install static pods are unable to connect to bootstrap.
2. What is the nature and description of the request?
When a customer deploys an UPI cluster, they normally need to manually create access rules between nodes (which are listed in our documentation). The problem is, if the network requirements related to etcd are not in place (master nodes are unable to reach port tcp/2379 from bootstrap node), the cluster installation fails causing static etcd pods to stay in CrashLoopBackof state. The problem is the error doesn't provide the full information about why it is Crashing:
{"level":"warn","ts":"2024-06-28T12:16:50.859825Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCDCTL_ENDPOINTS="} {"level":"warn","ts":"2024-06-28T12:16:55.88266Z","logger":"etcd-client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001ba000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""} Error: context deadline exceeded failed to create etcd client: context deadline exceeded
It would be of great help for customers if the related ip address that etcd is not reaching could be listed, so they could double check if all firewall rules/security groups settings are allowing the required traffic.
3. Why does the customer need this? (List the business requirements here)
To correctly identify the source of the problem when they are unable to deploy an UPI cluster.
4. List any affected packages or components.
- etcd