-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.7
-
Quality / Stability / Reliability
-
None
-
None
-
None
-
Important
-
None
-
All
-
None
-
None
-
Rejected
-
None
-
None
-
If docs needed, set a value
-
None
-
None
-
None
-
None
-
None
Description of problem:
Upgrading OCP from 4.6-4.7.9, breaks metering operator and the following errors are seen in the logs.
~~~
time="2021-06-18T04:00:04Z" level=error msg="error syncing ReportDataSource \"openshift-metering/pod-limit-cpu-cores\", adding back to queue" ReportDataSource=openshift-metering/pod-limit-cpu-cores app=metering component=reportDataSourceWorker error="ImportFromLastTimestamp errored: failed to store Prometheus metrics into table hive.metering.datasource_openshift_metering_pod_limit_cpu_cores for the range 2021-06-14 06:06:00 +0000 UTC to 2021-06-14 06:11:00 +0000 UTC: failed to store metrics into presto: presto SQL error: presto: query failed (200 OK): \"io.prestosql.spi.PrestoException: Error moving data files from file:/tmp/presto-reporting-operator/6b9c7de4-9173-484e-a773-5b06ac984b6e/dt=2021-06-14/20210618_040000_00106_59kig_0fad00d3-9a2f-4168-a308-5e9c0890cf2b to final location file:/user/hive/warehouse/metering.db/datasource_openshift_metering_pod_limit_cpu_cores/dt=2021-06-14/20210618_040000_00106_59kig_0fad00d3-9a2f-4168-a308-5e9c0890cf2b\"" logID=ArDObTmQql
time="2021-06-18T04:00:04Z" level=info msg="syncing ReportDataSource openshift-metering/pod-persistentvolumeclaim-request-info" app=metering component=reportDataSourceWorker logID=7XZEGZsiv6
time="2021-06-18T04:00:04Z" level=info msg="existing Prometheus ReportDataSource discovered, tableName: hive.metering.datasource_openshift_metering_pod_persistentvolumeclaim_request_info" app=metering component=reportDataSourceWorker logID=7XZEGZsiv6 namespace=openshift-metering reportDataSource=pod-persistentvolumeclaim-request-info
time="2021-06-18T04:00:04Z" level=warning msg="time range 2021-06-17 06:27:00 +0000 UTC to 2021-06-18 04:00:04.821440675 +0000 UTC exceeds PrometheusImporter MaxQueryRangeDuration 10m0s, newEndTime: 2021-06-17 06:37:00 +0000 UTC" app=metering chunkSize=5m0s component=PrometheusImporter logID=riVZjeU3EG namespace=openshift-metering reportDataSource=pod-persistentvolumeclaim-request-info stepSize=1m0s tableName=hive.metering.datasource_openshift_metering_pod_persistentvolumeclaim_request_info
time="2021-06-18T04:00:04Z" level=info msg="Event(v1.ObjectReference
{Kind:\"ReportDataSource\", Namespace:\"openshift-metering\", Name:\"pod-limit-cpu-cores\", UID:\"480328ef-71c2-493d-8544-48d414a6a04b\", APIVersion:\"metering.openshift.io/v1\", ResourceVersion:\"642114040\", FieldPath:\"\"}): type: 'Warning' reason: 'FailedPrometheusQuery' Unable to import metrics after Prometheus query failure. Check the reporting-operator container logs for more information." app=metering
time="2021-06-18T04:00:06Z" level=info msg="stored a total of 138 metrics for data between 2021-06-17 06:27:00 +0000 UTC and 2021-06-17 06:32:00 +0000 UTC into hive.metering.datasource_openshift_metering_pod_persistentvolumeclaim_request_info" app=metering chunkSize=5m0s component=PrometheusImporter endTime="2021-06-17 06:32:00 +0000 UTC" logID=riVZjeU3EG namespace=openshift-metering reportDataSource=pod-persistentvolumeclaim-request-info startTime="2021-06-17 06:27:00 +0000 UTC" stepSize=1m0s tableName=hive.metering.datasource_openshift_metering_pod_persistentvolumeclaim_request_info
~~~
~~~
reporting-operator-6f88d997c8-q5n54.log:time="2021-06-18T03:58:14Z" level=warning msg="Prometheus metrics import backlog detected: imported data for Prometheus ReportDataSource pod-persistentvolumeclaim-request-info newest imported metric timestamp 2021-06-17 06:26:00 +0000 UTC is 21h32m14.244430636s away, queuing to reprocess in 5.761902623s" app=metering component=reportDataSourceWorker logID=lVEHrglafs namespace=openshift-metering reportDataSource=pod-persistentvolumeclaim-request-info
reporting-operator-6f88d997c8-q5n54.log:time="2021-06-18T03:58:20Z" level=warning msg="Prometheus metrics import backlog detected: imported data for Prometheus ReportDataSource persistentvolumeclaim-capacity-bytes newest imported metric timestamp 2021-06-17 05:10:00 +0000 UTC is 22h48m20.269181077s away, queuing to reprocess in 8.152080853s" app=metering component=reportDataSourceWorker logID=b9a1ELsB3f namespace=openshift-metering reportDataSource=persistentvolumeclaim-capacity-bytes
reporting-operator-6f88d997c8-q5n54.log:time="2021-06-18T04:00:06Z" level=warning msg="Prometheus metrics import backlog detected: imported data for Prometheus ReportDataSource pod-persistentvolumeclaim-request-info newest imported metric timestamp 2021-06-17 06:32:00 +0000 UTC is 21h28m6.105762965s away, queuing to reprocess in 12.163683749s" app=metering component=reportDataSourceWorker logID=7XZEGZsiv6 namespace=openshift-metering reportDataSource=pod-persistentvolumeclaim-request-info
reporting-operator-6f88d997c8-q5n54.log:time="2021-06-18T04:00:07Z" level=warning msg="Prometheus metrics import backlog detected: imported data for Prometheus ReportDataSource persistentvolumeclaim-phase newest imported metric timestamp 2021-06-17 08:57:00 +0000 UTC is 19h3m7.570262894s away, queuing to reprocess in 5.128259091s" app=metering component=reportDataSourceWorker logID=Z2rNCzQNrX namespace=openshift-metering reportDataSource=persistentvolumeclaim-phase
~~~
Since the start of the OCP upgrade time, reports making using of the metering operator do not work and mention "data is unavailable for the specific period" . A one off report might work when the date is changed to current date but not for all the datasources.
The same issue has been observed in 3 different clusters immediately after the upgrade.
The only way to fix the issue is reinstall the operator with a clean PV/PVC because reinstaling using the same would complain about metering database already existing.
All the reports fail with "ReportingPeriodUnmetDependencies" and this is only seen after the OCP upgrade is initiated and the data availability also matches with the OCP upgrade start time.
Version-Release number of selected component (if applicable):
metering-operator.4.7.0-202104250659.p0
OCP 4.7.9