Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-8843

The deleted crd/servicemonitors.monitoring.coreos.com is not recreated quickly

XMLWordPrintable

    • Low
    • None
    • Unspecified
    • Hide
      Cause: The cluster-version operator waited up to five minutes for ClusterOperator resources to become happy before giving up on them and beginning another sync cycle.

      Consequence: Waiting on an unhappy ClusterOperator resource could delay the cluster-version operator from reconciling manifests which occurred earlier in the manifest graph.

      Fix: The cluster-version operator no longer waits for ClusterOperator resources. It immediately fails that sync cycle on them, and begins a new sync cycle.

      Result: The cluster-version operator will now take less time before reconciling earlier manifests.
      Show
      Cause: The cluster-version operator waited up to five minutes for ClusterOperator resources to become happy before giving up on them and beginning another sync cycle. Consequence: Waiting on an unhappy ClusterOperator resource could delay the cluster-version operator from reconciling manifests which occurred earlier in the manifest graph. Fix: The cluster-version operator no longer waits for ClusterOperator resources. It immediately fails that sync cycle on them, and begins a new sync cycle. Result: The cluster-version operator will now take less time before reconciling earlier manifests.
    • Bug Fix

      Description of problem:
      4.7.0-0.nightly-2021-02-09-192846 cluster, removed crd/servicemonitors.monitoring.coreos.com which should be reconciled by CVO, it is recreated later, but try again with the steps, it is not recrated.
      also tried in other 4.7 cluster, it is not recreated at all after it is removed.

      1. delete for the first time

      1. oc get crd/servicemonitors.monitoring.coreos.com
        NAME CREATED AT
        servicemonitors.monitoring.coreos.com 2021-02-09T23:40:33Z
      2. oc delete crd/servicemonitors.monitoring.coreos.com
        customresourcedefinition.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" deleted
      3. while true; do date; oc get crd/servicemonitors.monitoring.coreos.com; sleep 20s; done
        Wed Feb 10 02:22:55 EST 2021
        NAME CREATED AT
        servicemonitors.monitoring.coreos.com 2021-02-10T08:55:22Z
        Wed Feb 10 02:23:16 EST 2021
        NAME CREATED AT
        servicemonitors.monitoring.coreos.com 2021-02-10T08:55:22Z

      2. repeat the steps

      1. oc get crd/servicemonitors.monitoring.coreos.com
        NAME CREATED AT
        servicemonitors.monitoring.coreos.com 2021-02-10T08:55:22Z
      2. oc delete crd/servicemonitors.monitoring.coreos.com
        customresourcedefinition.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" deleted
      3. while true; do date; oc get crd/servicemonitors.monitoring.coreos.com; sleep 20s; done
        Wed Feb 10 03:54:53 EST 2021
        Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" not found
        Wed Feb 10 03:55:13 EST 2021
        Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" not found
        Wed Feb 10 03:55:33 EST 2021
        Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" not found
        Wed Feb 10 03:55:53 EST 2021
        Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" not found
        ...
        Wed Feb 10 04:03:18 EST 2021
        Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" not found
        ...
        Wed Feb 10 04:06:20 EST 2021
        Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" not found

      Could not update servicemonitor errors in CVO logs
      2021-02-10T09:20:27.701991313Z I0210 09:20:27.701944 1 sync_worker.go:937] Update error 635 of 668: UpdatePayloadResourceTypeMissing Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (635 of 668): the server does not recognize this resource, check extension API servers (*errors.withStack: failed to get resource type: no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1")
      2021-02-10T09:20:27.702001158Z I0210 09:20:27.701989 1 sync_worker.go:937] Update error 608 of 668: UpdatePayloadResourceTypeMissing Could not update servicemonitor "openshift-dns-operator/dns-operator" (608 of 668): the server does not recognize this resource, check extension API servers (*errors.withStack: failed to get resource type: no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1")
      2021-02-10T09:20:27.702036790Z E0210 09:20:27.702021 1 sync_worker.go:353] unable to synchronize image (waiting 2m50.956499648s): Multiple errors are preventing progress:
      2021-02-10T09:20:27.702036790Z * Could not update servicemonitor "openshift-dns-operator/dns-operator" (608 of 668): the server does not recognize this resource, check extension API servers
      2021-02-10T09:20:27.702036790Z * Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (635 of 668): the server does not recognize this resource, check extension API servers
      ...
      Version-Release number of the following components:
      4.7.0-0.nightly-2021-02-09-192846

      How reproducible:
      not sure

      Steps to Reproduce:
      1. see the steps
      2.
      3.

      Actual results:
      the removed crd/servicemonitors.monitoring.coreos.com is not recreated

      Expected results:
      should be recreated

      Additional info:

            trking W. Trevor King
            juzhao@redhat.com Junqi Zhao
            Yang Yang Yang Yang
            Red Hat Employee
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: