Our logging particularly around APIs for reading/writing metric data is really lacking. When we get customer support issues that include server log files, there usually insufficient logging to properly diagnose the situation. This includes debug/trace logging. There is no point to tell users to enable debug and/or trace logging because we do not really provide any useful debug/trace logging.
Based on recent customer issues, I am proposing a couple things to start. First, we should log the HTTP request for each REST endpoint. This should include query parameters or request body, headers, etc.
Secondly, in MetricsServiceImpl we need logging in the findXXXStats methods. For the methods that use tag filters, it would useful to see the total number of metrics that match the tags query.
It might be useful to log the total number of data points that are returned from Cassandra. Even though we aggregate the raw data points into buckets, we still wind up storing a lot in memory. The driver using paging, and the default page size is 5000. Suppose we have a stats query with tags that ends being 500 metrics with 5,000 data points each. We could wind up with 2.5 million CQL in memory at once.