In the current design all data points for a time series (i.e., metric) are stored within a single partition. Depending on the retention settings and the frequency of writes, this might be fine. For example, if raw data is kept for 7 days and if we only write one data point every 5 minutes, then we only have 2016 live cells in the partition. If we write more frequently though, we could wind up with hot spots in the cluster, resulting in degraded performance. This would also cause an uneven distribution in the load in the cluster. For instance, if the hot spot node starts running low on disk space due to high write throughput time series, adding more nodes won't help. Time bucketing however can help.
With time bucketing, we partition time series by duration which could be an hour, 6 hours, 12 hours, a day, a week, etc. There is no one size fits all solution. It is largely dependent on the frequency of writes and on the query patterns. Suppose we typically query for data within the past day and new data points are stored every 5 minutes. Using a bucket size of an hour would probably not be good because we do not have high frequency writes and the user query for data from the past day would require querying against 24 partitions. On the other hand we do not want bucket sizes too large. I think that a day is probably a reasonable default size.
We should have a system wide default which can be a day. The bucket size should be configurable in the following ways,
- By tenant
- This would change the bucket size for all metrics
- By tag(s)
- This would change the bucket size for all metrics matching the tag query filter
- By individual metric
- This would change the bucket size for the specified metric
If the bucket size is changed, the change should only apply going forward. It should not be applied to past data; otherwise, changing the bucket size would be a expensive operation in terms of the I/O involved.
There is a question of where this configuration/meta data should be stored. One option could be to store it in the metrics_idx table, although we might want to explore some other options.