Uploaded image for project: 'Knative Serving'
  1. Knative Serving
  2. SRVKS-1301

transient knative-operator-webhook 'failed calling webhook "webhook.serving.knative.dev"' errors after KnativeServing creation

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • 1.37.0
    • 1.35.0, 1.36.0, 1.35.1
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide
      In some circumstances, some cluster-scoped resources, such as webhook configurations are not removed during KnativeServing or the Serverless Operator uninstall, re-install or upgrade.

      These will prevent a reconciliation of KnativeServing, and the installation of KnativeServing will be stuck, reporting an error like
      ```
      failed to apply non rbac manifest: Internal error occurred: failed calling webhook \"webhook.serving.knative.dev\": failed to call webhook: Post \"https://webhook.knative-serving.svc:443/?timeout=10s\": no endpoints available for service \"webhook\"
      ```

      As a workaround, delete the hanging webhook configurations manually

      ```
      oc delete mutatingwebhookconfiguration webhook.serving.knative.dev
      oc delete validatingwebhookconfiguration config.webhook.serving.knative.dev validation.webhook.serving.knative.dev
      ```
      Show
      In some circumstances, some cluster-scoped resources, such as webhook configurations are not removed during KnativeServing or the Serverless Operator uninstall, re-install or upgrade. These will prevent a reconciliation of KnativeServing, and the installation of KnativeServing will be stuck, reporting an error like ``` failed to apply non rbac manifest: Internal error occurred: failed calling webhook \"webhook.serving.knative.dev\": failed to call webhook: Post \" https://webhook.knative-serving.svc:443/?timeout=10s \": no endpoints available for service \"webhook\" ``` As a workaround, delete the hanging webhook configurations manually ``` oc delete mutatingwebhookconfiguration webhook.serving.knative.dev oc delete validatingwebhookconfiguration config.webhook.serving.knative.dev validation.webhook.serving.knative.dev ```
    • Known Issue

      Seeing transient 

      failed to apply non rbac manifest: Internal error occurred: failed calling webhook \"webhook.serving.knative.dev\": failed to call webhook: Post \"https://webhook.knative-serving.svc:443/?timeout=10s\": no endpoints available for service \"webhook\"

      errors in knative-operator-webhook logs while reconciling newly created KnativeServing.

      {
        "level": "error",
        "ts": "2024-12-11T16:52:36.258Z",
        "logger": "knative-operator",
        "caller": "controller/controller.go:564",
        "msg": "Reconcile error",
        "commit": "8750a8b",
        "knative.dev/pod": "knative-operator-webhook-785b4bc7bf-dknvg",
        "knative.dev/controller": "knative.dev.operator.pkg.reconciler.knativeserving.Reconciler",
        "knative.dev/kind": "operator.knative.dev.KnativeServing",
        "knative.dev/traceid": "11967dfd-ad6d-46f4-b495-7a7830a57933",
        "knative.dev/key": "knative-serving/knative-serving",
        "duration": 4.711107858,
        "error": "failed to apply non rbac manifest: Internal error occurred: failed calling webhook \"webhook.serving.knative.dev\": failed to call webhook: Post \"https://webhook.knative-serving.svc:443/?timeout=10s\": no endpoints available for service \"webhook\"",
        "stacktrace": "knative.dev/pkg/controller.(*Impl).handleErr\n\t/workspace/vendor/knative.dev/pkg/controller/controller.go:564\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\t/workspace/vendor/knative.dev/pkg/controller/controller.go:541\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\t/workspace/vendor/knative.dev/pkg/controller/controller.go:489"
      } 

      the problem seems to be, as part of the KnativeServing reconciliation, we're adding https://github.com/openshift-knative/serverless-operator/blob/release-1.35/openshift-knative-operator/cmd/openshift-knative-operator/kodata/knative-serving/latest/2-serving-core.yaml#L558-L571 (the routing-serving-certs Certificate) , which is a certificate.networking.internal.knative.dev , which is a resource that is hooked by the webhook.serving.knative.dev webhook, whose deployment we reconcile together with that resource in the same KnativeServing reconciler...

      On a clean install, the webhook does eventually start up, the reconciliation is retried and the routing-serving-certs Certificate is created, so, eventually, KnativeServing is up and Ready.

      On upgrade or reinstall, an undeleted MutatingWebhookConfiguration webhook.serving.knative.dev from the previous KnativeServing version may block reconciliation of the new one forever.

        1. knative-operator-webhook-7458bff575-l6gm9-knative-operator (1).log
          244 kB
          Stavros Kontopoulos
        2. must-gather.local.5764065268316088423.tar.bz2
          2.01 MB
          Marek Schmidt
        3. reproducer-original.sh
          4 kB
          Marek Schmidt
        4. SRVKS-1301-SO-1.36.0-must-gathers.tar.bz2
          49.72 MB
          Marek Schmidt

              dsimansk@redhat.com David Simansky
              maschmid@redhat.com Marek Schmidt
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: