Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.7
Component/s: Metering Operator
Labels:
- migrated_from_bz
- needs_manual_sfdc

Activity Type:
Quality / Stability / Reliability
Blocked:
None
Blocked Reason:
None
Story Points:
None
Severity:
Important
Regression:
None
Architecture:

Unspecified

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
If docs needed, set a value
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

After a cluster upgrade from OCP 4.6 to 4.7.9, openshift-metering stopped working.This issue was observed in about 3 clusters.

metering-operator.4.7.0-202104250659.p0 Metering 4.7.0-202104250659.p0 metering-operator.4.6.0-202103010126.p0 Succeeded

the upgrade shows successful and the pods also startup fine. The reports fail and the logs indicate that the metering schema is missing. The backend PVC is on NFS if that makes a difference.
Version-Release number of selected component (if applicable):

Hive server log:

~~~
21/06/08 15:23:03 [206f8b7f-70c7-498a-bd99-a70661096d14 HiveServer2-Handler-Pool: Thread-39]: INFO parse.CalcitePlanner: Creating table metering.report_openshift_metering_namespace_cpu_utilization_2021_06_07 position=0
FAILED: SemanticException [Error 10072]: Database does not exist: metering
21/06/08 15:23:03 [206f8b7f-70c7-498a-bd99-a70661096d14 HiveServer2-Handler-Pool: Thread-39]: ERROR ql.Driver: FAILED: SemanticException [Error 10072]: Database does not exist: metering
org.apache.hadoop.hive.ql.parse.SemanticException: Database does not exist: metering
~~~

Metric collection also stopped at the same time when an OCP cluster upgrade was initiated.

~~~
$ cat 0030-metering-reportdatasources-raw.log.raw
NAME EARLIEST METRIC NEWEST METRIC IMPORT START IMPORT END LAST IMPORT TIME AGE
node-allocatable-cpu-cores 2021-03-12T03:01:00Z 2021-05-24T03:22:00Z 2021-03-12T03:01:00Z 2021-05-24T03:22:00Z 2021-05-24T03:22:15Z 89d
node-allocatable-memory-bytes 2021-03-12T03:01:00Z 2021-05-24T03:22:00Z 2021-03-12T03:01:00Z 2021-05-24T03:22:00Z 2021-05-24T03:22:23Z 89d
node-capacity-cpu-cores 2021-03-12T03:02:00Z 2021-05-24T03:25:00Z 2021-03-12T03:02:00Z 2021-05-24T03:25:00Z 2021-05-24T03:25:43Z 89d
node-capacity-memory-bytes 2021-03-12T03:02:00Z 2021-05-24T03:25:00Z 2021-03-12T03:02:00Z 2021-05-24T03:25:00Z 2021-05-24T03:26:16Z 89d
persistentvolumeclaim-capacity-bytes 2021-03-12T03:02:00Z 2021-05-24T03:21:00Z 2021-03-12T03:02:00Z 2021-05-24T03:21:00Z 2021-05-24T03:21:27Z 89d
persistentvolumeclaim-phase 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:28:47Z 89d
persistentvolumeclaim-request-bytes 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:30:44Z 89d
persistentvolumeclaim-usage-bytes 2021-03-12T03:02:00Z 2021-05-24T03:21:00Z 2021-03-12T03:02:00Z 2021-05-24T03:21:00Z 2021-05-24T03:21:28Z 89d
pod-limit-cpu-cores 2021-03-12T03:02:00Z 2021-05-24T03:22:00Z 2021-03-12T03:02:00Z 2021-05-24T03:22:00Z 2021-05-24T03:22:05Z 89d
pod-limit-memory-bytes 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:29:43Z 89d
pod-persistentvolumeclaim-request-info 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:30:45Z 89d
pod-request-cpu-cores 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:29:21Z 89d
pod-request-memory-bytes 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:28:04Z 89d
pod-usage-cpu-cores 2021-03-12T03:01:00Z 2021-05-24T03:24:00Z 2021-03-12T03:01:00Z 2021-05-24T03:24:00Z 2021-05-24T03:28:11Z 89d
pod-usage-memory-bytes 2021-03-12T03:01:00Z 2021-05-24T03:24:00Z 2021-03-12T03:01:00Z 2021-05-24T03:24:00Z 2021-05-24T03:28:15Z 89d
~~~

On hive server database:

hive> show databases;

<skip>
.
default
.
<skip>

hive> show tables from metering;

Resulted in an exception. ( ERROR ql.Driver: FAILED: SemanticException [Error 10072]: Database does not exist: metering)

The exception is expected because the metering schema itself is missing. We tried to recreate an empty database and restart the pods, that created certain tables but most of them are still missing.

time="2021-06-10T12:04:21Z" level=info msg="existing Prometheus ReportDataSource discovered, tableName: hive.metering.datasource_openshift_metering_pod_limit_memory_bytes" app=metering component=reportDataSourceWorker logID=nkhzrbQOJ2 namespace=openshift-metering reportDataSource=pod-limit-memory-bytes
time="2021-06-10T12:04:21Z" level=error msg="unable to get last timestamp for table hive.metering.datasource_openshift_metering_pod_limit_cpu_cores" app=metering chunkSize=5m0s component=PrometheusImporter error="error getting last timestamp for table hive.metering.datasource_openshift_metering_pod_limit_cpu_cores, maybe table doesn't exist yet? presto: query failed (200 OK): \"io.prestosql.spi.PrestoException: line 3:10: Table hive.metering.datasource_openshift_metering_pod_limit_cpu_cores does not exist\"" logID=jz1XKkQCT2 namespace=openshift-metering reportDataSource=pod-limit-cpu-cores stepSize=1m0s tableName=hive.metering.datasource_openshift_metering_pod_limit_cpu_cores

--> the entire db was recreated as well to make sure data corruption is eliminated.

-> Metering config status:

~~~
status:
conditions:

lastTransitionTime: "2021-06-16T07:49:07.603830Z"
message: Awaiting the next reconciliation
status: "False"
type: Running
~~~

Expected results:

-> Recreating the operator should not be needed during an upgrade and the reports are expected to work with any reinstallation of operator.

Assignee:: OpenShift Jira Bot

Reporter:: OpenShift Jira Bot

QA Contact:: Peter Ruan

Contributing Groups:: Red Hat Employee

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2021/06/18 12:30 AM

Updated:: 2025/07/27 11:27 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates