Uploaded image for project: 'OpenShift Service Mesh'
  1. OpenShift Service Mesh
  2. OSSM-3235

operator can deadlock when istiod deployment fails

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • OSSM 2.4.0
    • OSSM 2.4.0
    • Maistra
    • None

      There's a chance that if the operator creates an istiod deployment that never becomes ready, and then goes on to create the ValidatingWebhookConfiguration, it can deadlock on creating the DestinationRule in the istiod chart that is part of federation.yaml. We should make sure that our bootstrap resources are never targeted by the ValidatingWebhook, as it can lead to this race: the Service is not ready, the operator tries to create the DestinationRule but fails on the ValidatinWebhook for it, and never continues. Subsequent reconciliations do not seem to recover from this state. I had to manually delete the webhook and respin the operator

      2023-01-23T17:21:37.693Z	ERROR	controller	Reconciler error	{"controller": "servicemeshcontrolplane-controller", "name": "gateway-controller", "namespace": "openshift-ingress", "error": "istiod/templates/federation.yaml: Internal error occurred: failed calling webhook \"rev.validation.istio.io\": failed to call webhook: Post \"https://istiod-gateway-controller.openshift-ingress.svc:443/validate?timeout=10s\": dial tcp 10.96.90.167:443: connect: connection refused", "errorCauses": [{"error": "istiod/templates/federation.yaml: Internal error occurred: failed calling webhook \"rev.validation.istio.io\": failed to call webhook: Post \"https://istiod-gateway-controller.openshift-ingress.svc:443/validate?timeout=10s\": dial tcp 10.96.90.167:443: connect: connection refused", "errorVerbose": "Internal error occurred: failed calling webhook \"rev.validation.istio.io\": failed to call webhook: Post \"https://istiod-gateway-controller.openshift-ingress.svc:443/validate?timeout=10s\": dial tcp 10.96.90.167:443: connect: connection refused\nistiod/templates/federation.yaml\ngithub.com/maistra/istio-operator/pkg/controller/common/helm.(*ManifestProcessor).ProcessManifest\n\t/home/dgrimm/dev/istio-operator/pkg/controller/common/helm/manifestprocessing.go:116\ngithub.com/maistra/istio-operator/pkg/controller/common/helm.(*ManifestProcessor).ProcessManifests\n\t/home/dgrimm/dev/istio-operator/pkg/controller/common/helm/manifestprocessing.go:72\ngithub.com/maistra/istio-operator/pkg/controller/servicemesh/controlplane.(*controlPlaneInstanceReconciler).processComponentManifests\n\t/home/dgrimm/dev/istio-operator/pkg/controller/servicemesh/controlplane/manifestprocessing.go:30\ngithub.com/maistra/istio-operator/pkg/controller/servicemesh/controlplane.(*controlPlaneInstanceReconciler).Reconcile\n\t/home/dgrimm/dev/istio-operator/pkg/controller/servicemesh/controlplane/reconciler.go:275\ngithub.com/maistra/istio-operator/pkg/controller/servicemesh/controlplane.(*ControlPlaneReconciler).Reconcile\n\t/home/dgrimm/dev/istio-operator/pkg/controller/servicemesh/controlplane/controller.go:264\ngithub.com/maistra/istio-operator/pkg/controller/common.(*conflictHandlingReconciler).Reconcile\n\t/home/dgrimm/dev/istio-operator/pkg/controller/common/conflicts.go:25\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/dgrimm/dev/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:244\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/dgrimm/dev/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/home/dgrimm/dev/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/home/dgrimm/dev/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/home/dgrimm/dev/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/dgrimm/dev/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/home/dgrimm/dev/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594"}]}
      github.com/go-logr/zapr.(*zapLogger).Error
      	/home/dgrimm/dev/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
      	/home/dgrimm/dev/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:246
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
      	/home/dgrimm/dev/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:218
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
      	/home/dgrimm/dev/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.3/pkg/internal/controller/controller.go:197
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
      	/home/dgrimm/dev/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil
      	/home/dgrimm/dev/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156
      k8s.io/apimachinery/pkg/util/wait.JitterUntil
      	/home/dgrimm/dev/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133
      k8s.io/apimachinery/pkg/util/wait.Until
      	/home/dgrimm/dev/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90
      
      

              mluksa@redhat.com Marko Luksa
              dgrimm@redhat.com Daniel Grimm
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: