-
Feature
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
None
-
False
-
False
-
-
subs-cost
-
undefined
When distributing costs from the cost model, it should be possible to understand what is actually being distributed. We want to keep the cost distribution as simple as possible, and coherent though all the application itself.
Today, we consider that the cost distribution is done proportionally to the capacity:
- The costs applicable to a pod (and thus a project) are a percentage of the cost of the node/cluster capacity.
- Cost of pod = Pod CPU / Node CPU capacity * Node cost = Pod CPU / Cluster CPU Capacity * Cluster cost.
- Node cost = Node CPU capacity / Cluster CPU capacity * Cluster cost. The formulas are equivalent for pod costs as node and cluster costs are proportional to their capacities.
The same can be done considering memory instead of CPU.
When distributing the costs from the cost model, the distribution should follow the same pattern used for cloud: the cost of the cluster is distributed either to the nodes or to the projects based on cluster capacity (for nodes, node capacity / cluster capacity, for projects, used CPU/cluster capacity or CPU/node capacity). Doing so will mean that not all costs are distributed, as the utilization of the cluster can not be ever 100% in CPU or in memory (let's say that there will be an objective of filling clusters to 60%-80% of their capacity, depending on the type of workloads and the type of demand).
There should be a single distribution mechanism per cluster (see notes for alternatives).
It is valuable for customers to be able to identify resources grouped in the following elements:
- User costs. Costs directly associated to customers. For instance, project costs. These costs are dynamic in nature, and reflect the real use of the cluster to provide services. This is the most valuable use and cost.
- OpenShift usage and cost. The project and usage required to run and administer OpenShift itself (projects started with openshift- or kube-). This is required and reduces the available resources for users projects, but it is not useful (it does not allow customer to deliver any end user service). Reporting on this can help customers optimize their deployment (i.e. choose between bigger machines or smaller machines, as some services need to be available in every node - others are shared). This will have to take into account that some projects only run on master nodes and customer projects only run on worker nodes. Capacity needs to be evaluated to take into account only worker nodes, and consider master nodes to be fully used by OpenShift (with its costs).
- Unused / non-reserved/ available resources. CPU and memory that is not being used or reserved. Normally the objective - depending on the customer and the type of workload - is to keep this between 20 and 40% of the cluster. This can't be 0 (because workloads are elastic, they can require more resources to cope with increased demand or number of customers), but if it is too high the cluster is more expensive than it is necessary.
The sum of costs for user, OpenShift and unused must be 100% the total costs of the cluster at each moment. The current scheme is used in the full application.
Note: optional distribution of costs - can be done account/source wide
There are alternatives ways of distributing costs. For instance, some customers are requiring that 100% of the costs are assigned to customers. In that case, the distribution needs to be changed slightly, to consider that at each moment the total cost needs to be distributed into the projects.
In other issues, we have taken into account the reserved vs. used. For distribution, we need to be able to do the following:
- Instead of using the capacity to distribute cost (that won't ever be the full costs as unused and OpenShift will be present), distribute the full costs to customer usage. In this case, the total cost of the node or cluster is distributed to user projects only (not considering Openshift or unused usage). Each hour, the total cost is distributed to the user projects.
- There are two different ways of doing this automatically: proportionally or evenly (you could also manually assign a percentage of the costs to a set of projects). In both cases, the costs are distributed to the existing projects, distributed based on the percentage of usage of CPU/Memory, or evenly between the identified projects.
Why is this good:
- Costs per hour/month assigned to customers are always 100% of the costs. There is no unallocated costs.
- It consider actual usage and not capacity
Why is this bad:
- Cost per hour/month depends on factors uncontrollable by the user. A customer that is using 50 mCPU and 2GB of memory without any change will be charged differently depending on the cluster usage by other projects. For instance, if you are the first one to deploy an application, you are charged 100% of cluster costs until a second project is created, and then you can go down to 50% (for evenly distributed) of the previous charge without modifying anything or acting.
- There is no accountability. Those responsible for maintaining the cluster will get 100% of the cluster costs back even if only one project is using the cluster. Developers can improve their costs if the cluster admin onboards additional projects.
- It is hard to explain and understand why as a business owner I get different charges per hour. The cluster behaves like the spot instance market always.
As some customers has expressed they need this, it should be considered an enhancement.
Full feature (goal)
There is a better alternative to associating cost distribution to cost models: identify shared costs and select how they are going to be distributed.
In this case, the user will be able to define cluster/project/nodes/storage costs as shareable and select a distribution pattern:
- Not distributed: it will be shown as a cost not distributed. For instance a cluster cost that will only be shown at the cluster level (and will be shown as no-project as the project level and no-node at the node level)
- Proportionally: select CPU / Memory / storage / networking
- Fixed: select which elements receive the cost and a percentage (that should always sum up to 100%)
- Evenly: 1/n number of elements.
For proportionally and evenly, the customer should be able to choose whether unused and Openshift usage is taken into account in the distribution.