Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2535

Race condition deploying certs for managed webhooks

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Normal Normal
    • None
    • 4.10
    • OLM
    • None
    • Refinement Backlog
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      The deployment fails with the below error while creating the kubevirt-hyperconverged object after creating Namespace, OperatorGroup, and Subscription objects.

      ~~~
      Error from server (InternalError): error when creating "hyper.yaml": Internal error occurred: failed calling webhook "validate-hco.kubevirt.io": Post "https://hco-webhook-service.openshift-cnv.svc:4343/validate-hco-kubevirt-io-v1beta1-hyperconverged?timeout=30s": service "hco-webhook-service" not found
      ~~~

      The service hco-webhook-service is not created and OLM logs have got the error "could not create service hco-webhook-service: object is being deleted: services "hco-webhook-service" already exists".

      ~~~
      2022-01-28T06:21:39.901251731Z I0128 06:21:39.901171 1 event.go:282] Event(v1.ObjectReference

      {Kind:"ClusterServiceVersion", Namespace:"openshift-cnv", Name:"kubevirt-hyperconverged-operator.v4.8.4", UID:"835e6eb1-5e7e-40de-b922-5a22b3361778", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"105456", FieldPath:""}

      ): type: 'Warning' reason: 'InstallComponentFailed' install strategy failed: could not create service hco-webhook-service: object is being deleted: services "hco-webhook-service" already exists
      ~~~

      As per the audit logs, the OLM created the service, deleted it, and then created it back again which failed.

      Create event.

      ~~~
      {"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"029c1ae0-21aa-4d3d-80d8-ed0a06ac45bc","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/openshift-cnv/services","verb":"create","user":{"username":"system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount","uid":"1e38b88c-ddf2-426f-bf8c-692fce5cf4e9","groups":["system:serviceaccounts","system:serviceaccounts:openshift-operator-lifecycle-manager","system:authenticated"],"extra":{"authentication.kubernetes.io/pod-name":["olm-operator-56f69cbbbf-27t6s"],"authentication.kubernetes.io/pod-uid":["3b66a65b-d54f-487c-ac8c-96f94e21b933"]}},"sourceIPs":["10.30.1.5"],"userAgent":"olm/v0.0.0 (linux/amd64) kubernetes/$Format","objectRef":

      {"resource":"services","namespace":"openshift-cnv","name":"hco-webhook-service","apiVersion":"v1"}

      ,"responseStatus":{"metadata":{},"code":201},"requestReceivedTimestamp":"2022-01-28T06:21:31.931882Z","stageTimestamp":"2022-01-28T06:21:31.970348Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"olm-operator-binding-openshift-operator-lifecycle-manager\" of ClusterRole \"system:controller:operator-lifecycle-manager\" to ServiceAccount \"olm-operator-serviceaccount/openshift-operator-lifecycle-manager\""}}
      ~~~

      Delete and create.

      ~~~
      {"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"e3cec41d-18b4-4edf-8801-33bcceec05f7","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/openshift-cnv/services/hco-webhook-service","verb":"delete","user":{"username":"system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount","uid":"1e38b88c-ddf2-426f-bf8c-692fce5cf4e9","groups":["system:serviceaccounts","system:serviceaccounts:openshift-operator-lifecycle-manager","system:authenticated"],"extra":{"authentication.kubernetes.io/pod-name":["olm-operator-56f69cbbbf-27t6s"],"authentication.kubernetes.io/pod-uid":["3b66a65b-d54f-487c-ac8c-96f94e21b933"]}},"sourceIPs":["10.30.1.5"],"userAgent":"olm/v0.0.0 (linux/amd64) kubernetes/$Format","objectRef":

      {"resource":"services","namespace":"openshift-cnv","name":"hco-webhook-service","apiVersion":"v1"}

      ,"responseStatus":{"metadata":{},"status":"Success","code":200},"requestReceivedTimestamp":"2022-01-28T06:21:38.743079Z","stageTimestamp":"2022-01-28T06:21:38.775489Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"olm-operator-binding-openshift-operator-lifecycle-manager\" of ClusterRole \"system:controller:operator-lifecycle-manager\" to ServiceAccount \"olm-operator-serviceaccount/openshift-operator-lifecycle-manager\""}}

      {"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"c03142c9-900a-4abb-bf4d-42e511d190c0","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/openshift-cnv/services","verb":"create","user":{"username":"system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount","uid":"1e38b88c-ddf2-426f-bf8c-692fce5cf4e9","groups":["system:serviceaccounts","system:serviceaccounts:openshift-operator-lifecycle-manager","system:authenticated"],"extra":{"authentication.kubernetes.io/pod-name":["olm-operator-56f69cbbbf-27t6s"],"authentication.kubernetes.io/pod-uid":["3b66a65b-d54f-487c-ac8c-96f94e21b933"]}},"sourceIPs":["10.30.1.5"],"userAgent":"olm/v0.0.0 (linux/amd64) kubernetes/$Format","objectRef":

      {"resource":"services","namespace":"openshift-cnv","name":"hco-webhook-service","apiVersion":"v1"}

      ,"responseStatus":{"metadata":{},"status":"Failure","reason":"AlreadyExists","code":409}, <<<<<

      "requestReceivedTimestamp":"2022-01-28T06:21:38.782905Z","stageTimestamp":"2022-01-28T06:21:39.144566Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"olm-operator-binding-openshift-operator-lifecycle-manager\" of ClusterRole \"system:controller:operator-lifecycle-manager\" to ServiceAccount \"olm-operator-serviceaccount/openshift-operator-lifecycle-manager\""}}
      ~~~

      This is a newly deployed cluster with no previous installation history of Openshift Virtualization. Also, it's possible to create the service with the same spec after the deployment.

      The other webhook services are also created without any issues.

      Version-Release number of selected component (if applicable):

      kubevirt-hyperconverged-operator.v4.8.4
      omg get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.8.17 True False 2h2m Cluster version is 4.8.17

      How reproducible:

      Always reproducible in the customer environment.

      Steps to Reproduce:

      1. Follow the https://access.redhat.com/documentation/en-us/openshift_container_platform/4.8/html-single/openshift_virtualization/index#installing-virt-cli
      2. Deployment fails while creating kubevirt-hyperconverged.
      3.

      Actual results:

      Installation of Openshift virtualization fails with error service "hco-webhook-service" not found

      Expected results:

      Installation of Openshift virtualization should work.

      Additional info:
      The original BZ contains extra information, but cannot be used because it was closed by ERRATA and the OLM team no longer users BZ to track issues.

              agreene1991 Alexander Greene (Inactive)
              agreene1991 Alexander Greene (Inactive)
              Xia Zhao Xia Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: