Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-8899

Database schemas missing after upgrading to 4.7 causing reports to fail

XMLWordPrintable

    • Quality / Stability / Reliability
    • None
    • None
    • None
    • Important
    • None
    • Unspecified
    • None
    • None
    • Rejected
    • None
    • None
    • If docs needed, set a value
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      After a cluster upgrade from OCP 4.6 to 4.7.9, openshift-metering stopped working.This issue was observed in about 3 clusters.

      metering-operator.4.7.0-202104250659.p0 Metering 4.7.0-202104250659.p0 metering-operator.4.6.0-202103010126.p0 Succeeded

      the upgrade shows successful and the pods also startup fine. The reports fail and the logs indicate that the metering schema is missing. The backend PVC is on NFS if that makes a difference.
      Version-Release number of selected component (if applicable):

      Hive server log:

      ~~~
      21/06/08 15:23:03 [206f8b7f-70c7-498a-bd99-a70661096d14 HiveServer2-Handler-Pool: Thread-39]: INFO parse.CalcitePlanner: Creating table metering.report_openshift_metering_namespace_cpu_utilization_2021_06_07 position=0
      FAILED: SemanticException [Error 10072]: Database does not exist: metering
      21/06/08 15:23:03 [206f8b7f-70c7-498a-bd99-a70661096d14 HiveServer2-Handler-Pool: Thread-39]: ERROR ql.Driver: FAILED: SemanticException [Error 10072]: Database does not exist: metering
      org.apache.hadoop.hive.ql.parse.SemanticException: Database does not exist: metering
      ~~~

      Metric collection also stopped at the same time when an OCP cluster upgrade was initiated.

      ~~~
      $ cat 0030-metering-reportdatasources-raw.log.raw
      NAME EARLIEST METRIC NEWEST METRIC IMPORT START IMPORT END LAST IMPORT TIME AGE
      node-allocatable-cpu-cores 2021-03-12T03:01:00Z 2021-05-24T03:22:00Z 2021-03-12T03:01:00Z 2021-05-24T03:22:00Z 2021-05-24T03:22:15Z 89d
      node-allocatable-memory-bytes 2021-03-12T03:01:00Z 2021-05-24T03:22:00Z 2021-03-12T03:01:00Z 2021-05-24T03:22:00Z 2021-05-24T03:22:23Z 89d
      node-capacity-cpu-cores 2021-03-12T03:02:00Z 2021-05-24T03:25:00Z 2021-03-12T03:02:00Z 2021-05-24T03:25:00Z 2021-05-24T03:25:43Z 89d
      node-capacity-memory-bytes 2021-03-12T03:02:00Z 2021-05-24T03:25:00Z 2021-03-12T03:02:00Z 2021-05-24T03:25:00Z 2021-05-24T03:26:16Z 89d
      persistentvolumeclaim-capacity-bytes 2021-03-12T03:02:00Z 2021-05-24T03:21:00Z 2021-03-12T03:02:00Z 2021-05-24T03:21:00Z 2021-05-24T03:21:27Z 89d
      persistentvolumeclaim-phase 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:28:47Z 89d
      persistentvolumeclaim-request-bytes 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:30:44Z 89d
      persistentvolumeclaim-usage-bytes 2021-03-12T03:02:00Z 2021-05-24T03:21:00Z 2021-03-12T03:02:00Z 2021-05-24T03:21:00Z 2021-05-24T03:21:28Z 89d
      pod-limit-cpu-cores 2021-03-12T03:02:00Z 2021-05-24T03:22:00Z 2021-03-12T03:02:00Z 2021-05-24T03:22:00Z 2021-05-24T03:22:05Z 89d
      pod-limit-memory-bytes 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:29:43Z 89d
      pod-persistentvolumeclaim-request-info 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:30:45Z 89d
      pod-request-cpu-cores 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:29:21Z 89d
      pod-request-memory-bytes 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-03-12T03:02:00Z 2021-05-24T03:28:00Z 2021-05-24T03:28:04Z 89d
      pod-usage-cpu-cores 2021-03-12T03:01:00Z 2021-05-24T03:24:00Z 2021-03-12T03:01:00Z 2021-05-24T03:24:00Z 2021-05-24T03:28:11Z 89d
      pod-usage-memory-bytes 2021-03-12T03:01:00Z 2021-05-24T03:24:00Z 2021-03-12T03:01:00Z 2021-05-24T03:24:00Z 2021-05-24T03:28:15Z 89d
      ~~~

      On hive server database:

      hive> show databases;

      <skip>
      .
      default
      .
      <skip>

      hive> show tables from metering;

      Resulted in an exception. ( ERROR ql.Driver: FAILED: SemanticException [Error 10072]: Database does not exist: metering)

      The exception is expected because the metering schema itself is missing. We tried to recreate an empty database and restart the pods, that created certain tables but most of them are still missing.

      time="2021-06-10T12:04:21Z" level=info msg="existing Prometheus ReportDataSource discovered, tableName: hive.metering.datasource_openshift_metering_pod_limit_memory_bytes" app=metering component=reportDataSourceWorker logID=nkhzrbQOJ2 namespace=openshift-metering reportDataSource=pod-limit-memory-bytes
      time="2021-06-10T12:04:21Z" level=error msg="unable to get last timestamp for table hive.metering.datasource_openshift_metering_pod_limit_cpu_cores" app=metering chunkSize=5m0s component=PrometheusImporter error="error getting last timestamp for table hive.metering.datasource_openshift_metering_pod_limit_cpu_cores, maybe table doesn't exist yet? presto: query failed (200 OK): \"io.prestosql.spi.PrestoException: line 3:10: Table hive.metering.datasource_openshift_metering_pod_limit_cpu_cores does not exist\"" logID=jz1XKkQCT2 namespace=openshift-metering reportDataSource=pod-limit-cpu-cores stepSize=1m0s tableName=hive.metering.datasource_openshift_metering_pod_limit_cpu_cores

      --> the entire db was recreated as well to make sure data corruption is eliminated.

      -> Metering config status:

      ~~~
      status:
      conditions:

      • lastTransitionTime: "2021-06-16T07:49:07.603830Z"
        message: Awaiting the next reconciliation
        status: "False"
        type: Running
        ~~~

      Expected results:

      -> Recreating the operator should not be needed during an upgrade and the reports are expected to work with any reinstallation of operator.

              openshift_jira_bot OpenShift Jira Bot
              openshift_jira_bot OpenShift Jira Bot
              None
              None
              Peter Ruan Peter Ruan
              None
              Red Hat Employee
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: