Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4559

Avoid brittle REST-mapper assumptions vs. v1alpha1 InsightsDataGather

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Minor
    • None
    • 4.13, 4.12
    • Test Framework
    • Low
    • CCXDEV Sprint 79, CCXDEV Sprint 80
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      The Insights operator appears to be emitting events repeatedly, only on techpreview clusters in CI. The problem surfaces as a test failure:

      [sig-arch] events should not repeat pathologically expand_less 	0s
      {  1 events happened too frequently
      
      event happened 133 times, something is wrong: ns/default namespace/default - reason/Unable to find REST mapping for %s/%s: %w InsightsDataGather.config.openshift.io%!(EXTRA string=v1, *meta.NoKindMatchError=no matches for kind "InsightsDataGather" in version "config.openshift.io/v1")}
      

      Example:

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.13-e2e-azure-sdn-techpreview/1599796158643834880

      Problem appears to be only surfacing on techpreview clusters, but it spans many providers:

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.13-e2e-gcp-sdn-techpreview/1598487648693915648

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.13-e2e-aws-sdn-techpreview/1598341406479355904

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.13-e2e-azure-sdn-techpreview-serial/1599796161986695168

      We've identified 138 occurrences in the last two weeks on 4.13, but most likely it's affecting every single job run, and subsequently failing the job run. It looks like virtually all techpreview jobs are at 0% pass rate and this is likely involved.

      https://sippy.dptools.openshift.org/sippy-ng/jobs/4.13?filters=%257B%2522items%2522%253A%255B%257B%2522columnField%2522%253A%2522variants%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522never-stable%2522%252C%2522not%2522%253Atrue%257D%252C%257B%2522id%2522%253A99%252C%2522columnField%2522%253A%2522name%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522techpreview%2522%257D%255D%257D&sort=asc&sortField=net_improvement

      I believe the problem is impacting 4.12 as well and thus likely needs a backport.

      On the surface it looks like two problems here:

      1. the fact the event emits repeatedly (every reconcile loop perhaps?)
      2. a bad format string

      Given this appears to be 100% failing all runs for most techpreview jobs, rating Sev=Important. (Repeated events can be a strain on etcd)

      Attachments

        Issue Links

          Activity

            People

              tremes1@redhat.com Tomas Remes
              rhn-engineering-dgoodwin Devan Goodwin
              Joao Bastos Fula Joao Bastos Fula
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: