-
Story
-
Resolution: Won't Do
-
Minor
-
None
-
None
-
None
-
None
-
False
-
-
False
-
None
-
None
-
None
-
None
-
None
See thread and thread for background.
Our MachinePool controller, like all our controllers, builds a kube client to talk to the spoke cluster being reconciled. We do not cache these clients today. The main reasons:
1. It's Hardâ„¢. Two aspects of that off the top:
- If the client goes stale, how do we distinguish that from other errors so we can rebuild it?
- We need to not leak cache entries. Possible solutions include adding a CD finalizer so we can delete the entry for that CD before letting it be garbage collected.
2. It's expensive. Each client is biggish; and we support hundreds-to-thousands of spokes per hive.
(See HIVE-2399 where we're trying to do something like this. We haven't managed to get it working properly yet.)
Assuming we did manage to cache clients locally, the next step would be to replace some of the places where we're hard polling spokes with Watch()es via those clients. I don't know how expensive those Watch()es are when they're spread across hundreds-to-thousands of clients vs just the one (to the local KAS) we're using today. I also don't know if we could rely on them for all the use cases where we're currently polling. For example, the point of the unreachable controller is to update status/labels when the remote cluster is... unreachable. Does a Watch() pop when the remote KAS breaks? Dunno.
Compare and contrast with HIVE-2539, which seeks to reduce network traffic by reducing the number of objects retrieved per reconcile, as opposed to reducing the number of times we need to request the same objects.
- relates to
-
HIVE-2539 Rework the machinepool controller for network
-
- To Do
-