Uploaded image for project: 'Product Technical Learning'
  1. Product Technical Learning
  2. PTL-8803

DO280-455: Ch9 lab: CockroachDB operator is broken

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • DO280 - OCP4.5 1 20200825
    • DO280 - OCP 4.2 1 20200123
    • DO280
    • en-US (English)

      URL:
      Reporter RHNID:
      Section: - Managing the Cluster with the Web Console
      Language: en-US (English)
      Workaround: No workaround.

      Description: Step 5 instructs the students to install the Community CockroachDB operator.

      Step 7.4 then creates an instance of Cockroachdb CR which attempts to deploy a StatefulSet running three instances of cockroachdb/cockroach:v19.1.3 (at the moment).

      Repeatedly, v19.1.3 of the image fails to initialise the cluster because of endless ReadinessProbe failures, which in turn prevent the database endpoints from being added to the "example-${foo}-public" service.

      In turn, this breaks steps 8 and 9 and renders this lab impossible to finish.

      It gets worse. Trying to remove the Cockroachdb CR in order to retry with a newer version gets stuck because there's a Helm finalizer that never completes, blocks the operator, and prevents any additional Cockroachdb CRs from ever being created anywhere.

      It also prevents any attempts to uninstall the operator, even with all the finalizer/kubeproxy trickery in the world.

      In short, it kills the cluster.

      Do not use community operators in our courses any more, please, and if you do, make sure you document exactly what version of operator was tested with what release of OCP.


      Debugging notes:

      $ oc delete cockroachdb example
      cockroachdb.charts.helm.k8s.io "example" deleted
      $ oc get cockroachdb example -o yaml
      apiVersion: charts.helm.k8s.io/v1alpha1
      kind: Cockroachdb
      metadata:
        creationTimestamp: "2020-03-13T14:21:10Z"
        deletionGracePeriodSeconds: 0
        deletionTimestamp: "2020-03-13T14:22:20Z"
        finalizers:
        - uninstall-helm-release
      ...
      status:
        conditions:
      ...
        - lastTransitionTime: "2020-03-13T14:21:11Z"
          message: 'failed to get release history: release: "example-${foo}"
            not found'
          reason: UninstallError
          status: "True"
          type: ReleaseFailed

              rht-miphilli Michael Phillips
              gregab@p0f.net Grega Bremec
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: