Uploaded image for project: 'OpenShift Service Mesh'
  1. OpenShift Service Mesh
  2. OSSM-542

Galley is not using the new certificate after rotation

    • Sprint 14, OSSM 2.2 - 1, OSSM 2.2 - 2

      Galley pod is not using the correct certificate after its renewal and it is needed to restart the related pod to force it to reload the correct certificate.

      The issue is 100% reproducible.

      Reproducer:

      • Install Service Mesh operator.
      • Deploy Service Mesh Control Plane with version 1.1 and short certificate ttl, for example:
      apiVersion: maistra.io/v2
      kind: ServiceMeshControlPlane
      metadata:
        name: basic
        namespace: istio-system-v1
      spec:
        version: v1.1
        techPreview:
          security:
            workloadCertTtl: 15m
        tracing:
          type: Jaeger
          sampling: 10000
        addons:
          jaeger:
            name: jaeger
            install:
              storage:
                type: Memory
          kiali:
            enabled: true
            name: kiali
          grafana:
            enabled: true
      ---
      apiVersion: maistra.io/v1
      kind: ServiceMeshMemberRoll
      metadata:
       name: default
      spec:
       members:
       - bookinfo
       
      • Wait for the deployment of the control plane and for passing the certificate ttl.
      • Try to deploy the bookinfo example in the mesh, at the step for creating the virtual service and gateway an error is reported for certificate expired:
      $ oc apply -n bookinfo -f https://raw.githubusercontent.com/Maistra/istio/maistra-2.0/samples/bookinfo/networking/bookinfo-gateway.yaml
      Error from server (InternalError): error when creating "https://raw.githubusercontent.com/Maistra/istio/maistra-2.0/samples/bookinfo/networking/bookinfo-gateway.yaml": Internal error occurred: failed calling webhook "pilot.validation.istio.io": Post "https://istio-galley.istio-system-v1.svc:443/admitpilot?timeout=30s": x509: certificate has expired or is not yet valid: current time 2021-07-16T10:12:48Z is after 2021-07-16T08:33:02Z
      Error from server (InternalError): error when creating "https://raw.githubusercontent.com/Maistra/istio/maistra-2.0/samples/bookinfo/networking/bookinfo-gateway.yaml": Internal error occurred: failed calling webhook "pilot.validation.istio.io": Post "https://istio-galley.istio-system-v1.svc:443/admitpilot?timeout=30s": x509: certificate has expired or is not yet valid: current time 2021-07-16T10:12:48Z is after 2021-07-16T08:33:02Z
      
      • restarting the Galley pod fix the issue.

      The expectation is that Galley should reload autonomously the new certificate.

            [OSSM-542] Galley is not using the new certificate after rotation

            To Hung Sze added a comment -

            Setting Daniel as Assignee to indicate he is the one who fixed this.

            To Hung Sze added a comment - Setting Daniel as Assignee to indicate he is the one who fixed this.

            Praneeth Bajjuri added a comment - - edited

            Gateway and VirtualService configured successfully with certificate. 

            tested successfully without any error.

            Praneeth Bajjuri added a comment - - edited Gateway and VirtualService configured successfully with certificate.  tested successfully without any error.

            Hi pbajjuri0204, did you make sure you're using the right pilot image? I'm not sure how the daily build works wrt deploying older versions. It might be that the operator bundle still points to the 1.1.17 images

            Daniel Grimm added a comment - Hi pbajjuri0204 , did you make sure you're using the right pilot image? I'm not sure how the daily build works wrt deploying older versions. It might be that the operator bundle still points to the 1.1.17 images

            Tested with the latest nightly build, still, it is not fixed. will check with the developer in which nightly build this fix will availble. 

            Praneeth Bajjuri added a comment - Tested with the latest nightly build, still, it is not fixed. will check with the developer in which nightly build this fix will availble. 

            Turns out this is a bug we introduced when moving the ValidatingWebhookConfiguration management into the operator. Galley does in fact watch the certificate, but it never reaches that loop because it's stuck sending on an unbuffered channel. I have a fix ready and will push asap

            Daniel Grimm added a comment - Turns out this is a bug we introduced when moving the ValidatingWebhookConfiguration management into the operator. Galley does in fact watch the certificate, but it never reaches that loop because it's stuck sending on an unbuffered channel. I have a fix ready and will push asap

            dgrimm@redhat.com what is the plan to work on it?
            Maybe I shouldn't ping you about it but I am not  aware you have someone who is responsible for sprint planning (if it's not PM but someone else like for example ARO team has)
            Still, if you could shed light on it I would appreciate.
            P.S.
            I am sorry for a mess  I accidentally clicked "need info" and later changed to "ACCEPTED"

            Olimp Bockowski added a comment - dgrimm@redhat.com what is the plan to work on it? Maybe I shouldn't ping you about it but I am not  aware you have someone who is responsible for sprint planning (if it's not PM but someone else like for example ARO team has) Still, if you could shed light on it I would appreciate. P.S. I am sorry for a mess  I accidentally clicked "need info" and later changed to "ACCEPTED"

            This is a valid issue. Galley in 1.1 and earlier does not seem to watch its workload secret; thus it never realizes when it has been rotated and will still try to serve the validatingWebhook with an invalid certificate.

            Daniel Grimm added a comment - This is a valid issue. Galley in 1.1 and earlier does not seem to watch its workload secret; thus it never realizes when it has been rotated and will still try to serve the validatingWebhook with an invalid certificate.

              dgrimm@redhat.com Daniel Grimm
              rhn-support-cpassare Christian Passarelli
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated:
                Resolved: