-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.7
-
Quality / Stability / Reliability
-
None
-
None
-
None
-
Important
-
None
-
Unspecified
-
None
-
None
-
Rejected
-
None
-
None
-
If docs needed, set a value
-
None
-
None
-
None
-
None
-
None
Description of problem:
After a cluster upgrade from OCP 4.6 to 4.7.9, openshift-metering stopped working.This issue was observed in about 3 clusters.
metering-operator.4.7.0-202104250659.p0 Metering 4.7.0-202104250659.p0 metering-operator.4.6.0-202103010126.p0 Succeeded
the upgrade shows successful and the pods also startup fine. The reports fail and the logs indicate that the metering schema is missing. The backend PVC is on NFS if that makes a difference.
Version-Release number of selected component (if applicable):
Hive server log:
~~~
21/06/08 15:23:03 [206f8b7f-70c7-498a-bd99-a70661096d14 HiveServer2-Handler-Pool: Thread-39]: INFO parse.CalcitePlanner: Creating table metering.report_openshift_metering_namespace_cpu_utilization_2021_06_07 position=0
FAILED: SemanticException [Error 10072]: Database does not exist: metering
21/06/08 15:23:03 [206f8b7f-70c7-498a-bd99-a70661096d14 HiveServer2-Handler-Pool: Thread-39]: ERROR ql.Driver: FAILED: SemanticException [Error 10072]: Database does not exist: metering
org.apache.hadoop.hive.ql.parse.SemanticException: Database does not exist: metering
~~~
Metric collection also stopped at the same time when an OCP cluster upgrade was initiated.
~~~
$ cat 0030-metering-reportdatasources-raw.log.raw
NAME EARLIEST METRIC NEWEST METRIC IMPORT START IMPORT END LAST IMPORT TIME AGE
node-allocatable-cpu-cores 2021-03-12T03:01:00Z 2021-05-24T03:22:00Z 2021-03-12T03:01:00Z 2021-05-24T03:22:00Z 2021-05-24T03:22:15Z 89d
node-allocatable-memory-bytes 2021-03-12T03:01:00Z 2021-05-24T03:22:00Z 2021-03-12T03:01:00Z 2021-05-24T03:22:00Z 2021-05-24T03:22:23Z 89d
node-capacity-cpu-cores 2021-03-12T03:02:00Z 2021-05-24T03:25:00Z 2021-03-12T03:02:00Z 2021-05-24T03:25:00Z 2021-05-24T03:25:43Z 89d
node-capacity-memory-bytes 2021-03-12T03:02:00Z 2021-05-24T03:25:00Z 2021-03-12T03:02:00Z 2021-05-24T03:25:00Z 2021-05-24T03:26:16Z 89d
persistentvolumeclaim-capacity-bytes 2021-03-12T03:02:00Z 2021-05-24T03:21:00Z 2021-03-12T03:02:00Z 2021-05-24T03:21:00Z 2021-05-24T03:21:27Z 89d
persistentvolumeclaim-phase 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:28:47Z 89d
persistentvolumeclaim-request-bytes 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:30:44Z 89d
persistentvolumeclaim-usage-bytes 2021-03-12T03:02:00Z 2021-05-24T03:21:00Z 2021-03-12T03:02:00Z 2021-05-24T03:21:00Z 2021-05-24T03:21:28Z 89d
pod-limit-cpu-cores 2021-03-12T03:02:00Z 2021-05-24T03:22:00Z 2021-03-12T03:02:00Z 2021-05-24T03:22:00Z 2021-05-24T03:22:05Z 89d
pod-limit-memory-bytes 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:29:43Z 89d
pod-persistentvolumeclaim-request-info 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:30:45Z 89d
pod-request-cpu-cores 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:29:21Z 89d
pod-request-memory-bytes 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:28:04Z 89d
pod-usage-cpu-cores 2021-03-12T03:01:00Z 2021-05-24T03:24:00Z 2021-03-12T03:01:00Z 2021-05-24T03:24:00Z 2021-05-24T03:28:11Z 89d
pod-usage-memory-bytes 2021-03-12T03:01:00Z 2021-05-24T03:24:00Z 2021-03-12T03:01:00Z 2021-05-24T03:24:00Z 2021-05-24T03:28:15Z 89d
~~~
On hive server database:
hive> show databases;
<skip>
.
default
.
<skip>
hive> show tables from metering;
Resulted in an exception. ( ERROR ql.Driver: FAILED: SemanticException [Error 10072]: Database does not exist: metering)
The exception is expected because the metering schema itself is missing. We tried to recreate an empty database and restart the pods, that created certain tables but most of them are still missing.
time="2021-06-10T12:04:21Z" level=info msg="existing Prometheus ReportDataSource discovered, tableName: hive.metering.datasource_openshift_metering_pod_limit_memory_bytes" app=metering component=reportDataSourceWorker logID=nkhzrbQOJ2 namespace=openshift-metering reportDataSource=pod-limit-memory-bytes
time="2021-06-10T12:04:21Z" level=error msg="unable to get last timestamp for table hive.metering.datasource_openshift_metering_pod_limit_cpu_cores" app=metering chunkSize=5m0s component=PrometheusImporter error="error getting last timestamp for table hive.metering.datasource_openshift_metering_pod_limit_cpu_cores, maybe table doesn't exist yet? presto: query failed (200 OK): \"io.prestosql.spi.PrestoException: line 3:10: Table hive.metering.datasource_openshift_metering_pod_limit_cpu_cores does not exist\"" logID=jz1XKkQCT2 namespace=openshift-metering reportDataSource=pod-limit-cpu-cores stepSize=1m0s tableName=hive.metering.datasource_openshift_metering_pod_limit_cpu_cores
--> the entire db was recreated as well to make sure data corruption is eliminated.
-> Metering config status:
~~~
status:
conditions:
- lastTransitionTime: "2021-06-16T07:49:07.603830Z"
message: Awaiting the next reconciliation
status: "False"
type: Running
~~~
Expected results:
-> Recreating the operator should not be needed during an upgrade and the reports are expected to work with any reinstallation of operator.