-
Bug
-
Resolution: Done
-
Major
-
DO280 - OCP 4.2 1 20200123
-
en-US (English)
URL:
Reporter RHNID:
Section: - Managing the Cluster with the Web Console
Language: en-US (English)
Workaround: No workaround.
Description: Step 5 instructs the students to install the Community CockroachDB operator.
Step 7.4 then creates an instance of Cockroachdb CR which attempts to deploy a StatefulSet running three instances of cockroachdb/cockroach:v19.1.3 (at the moment).
Repeatedly, v19.1.3 of the image fails to initialise the cluster because of endless ReadinessProbe failures, which in turn prevent the database endpoints from being added to the "example-${foo}-public" service.
In turn, this breaks steps 8 and 9 and renders this lab impossible to finish.
It gets worse. Trying to remove the Cockroachdb CR in order to retry with a newer version gets stuck because there's a Helm finalizer that never completes, blocks the operator, and prevents any additional Cockroachdb CRs from ever being created anywhere.
It also prevents any attempts to uninstall the operator, even with all the finalizer/kubeproxy trickery in the world.
In short, it kills the cluster.
Do not use community operators in our courses any more, please, and if you do, make sure you document exactly what version of operator was tested with what release of OCP.
Debugging notes:
$ oc delete cockroachdb example cockroachdb.charts.helm.k8s.io "example" deleted $ oc get cockroachdb example -o yaml apiVersion: charts.helm.k8s.io/v1alpha1 kind: Cockroachdb metadata: creationTimestamp: "2020-03-13T14:21:10Z" deletionGracePeriodSeconds: 0 deletionTimestamp: "2020-03-13T14:22:20Z" finalizers: - uninstall-helm-release ... status: conditions: ... - lastTransitionTime: "2020-03-13T14:21:11Z" message: 'failed to get release history: release: "example-${foo}" not found' reason: UninstallError status: "True" type: ReleaseFailed