-
Task
-
Resolution: Done
-
Normal
-
None
-
1
-
False
-
None
-
False
-
-
We have two customers trying to install the operator on different ROSA HCP clusters with the same issue on separate support cases. They follow the same steps they would use to install it on a ROSA Classic cluster and the installation fails.
I have looked at the logs and found similar errors and warnings:
Warning:
time="2024-07-19T11:07:53Z" level=warning msg="needs reinstall: waiting for deployment costmanagement-metrics-operator to become ready: deployment \"costmanagement-metrics-operator\" not available: Deployment does not have minimum availability." csv=costmanagement-metrics-operator.3.3.0 id=XaYc2 namespace=costmanagement-metrics-operator phase=Failed strategy=deployment
Error:
machine_controller.go:641] "Drain failed, retry in 20s" err="[ ... error when waiting for pod \"costmanagement-metrics-operator-...\" terminating: global timeout reached: 20s
Other:
event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"costmanagement-metrics-operator", Name:"costmanagement-metrics-operator.3.3.0", UID:"...", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"40971299", FieldPath:""}): type: 'Warning' reason: 'InstallComponentFailed' install strategy failed: Internal error occurred: failed calling webhook "validate.kyverno.svc-fail": failed to call webhook: Post "https://kyverno-svc.kyverno.svc:443/validate/fail?timeout=10s": no endpoints available for service "kyverno-svc"
I have discussed this with ROSA HCP SRE and they don't see a platform issue, would it be possible to check if this may have to do with an architectural difference that the operator has not accounted for?