Uploaded image for project: 'Red Hat Service Interconnect (Skupper)'
  1. Red Hat Service Interconnect (Skupper)
  2. SKUPPER-803

Skupper fails to reload TLS certificates for inter-router links when secrets are updated

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • 1.5.3
    • 1.1
    • Control plane
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      When a certificate/secret is updated in OCP, skupper loads the new certificate.

      A user configured all the certificates using the certmanager operator. They chose to have each certificate to have a max duration of 1 hour and renew it every 15 minutes for testing purpose.

      When starting up all the pods they saw that flow-collector container in the service-controller pod connects to the router pod.

      After a couple of hours they noticed that a lot of error logs appeared in the flow collector pod:

      2023/08/18 07:45:52 COLLECTOR: Error receiving message  remote error: tls: expired certificate
      2023/08/18 07:45:52 COLLECTOR: Error receiving message  remote error: tls: expired certificate 
      
      The logs in the router container give:
      {code:java}
      08-18 07:07:12.918718 +0000 SERVER (info) [C2437370] Accepted connection to :5671 from 10.x.x.x:44156
      2023-08-18 07:07:12.928722 +0000 SERVER (error) [C2437370] Connection from 10.x.x.x:44156 (to :5671) failed: amqp:connection:framing-error SSL Failure: error:0A000086:SSL routines::certificate verify failed
      2023-08-18 07:07:12.930201 +0000 SERVER (info) [C2437371] Accepted connection to :5671 from 10.x.x.x:44172
      2023-08-18 07:07:12.937756 +0000 SERVER (error) [C2437371] Connection from 10.x.x.x:44172 (to :5671) failed: amqp:connection:framing-error SSL Failure: error:0A000086:SSL routines::certificate verify failed by looking into both containers, they did see that when the certificates secrets were renewed by certmanager the new certificates were also present in both pods. So they think the pods do actually renew the secrets once they are changed. 

      When they restarted the service-controller pod the errors disappeared and the flow-collector pod seems to reconnect to the router pods fine, until the same thing with the errors occurs after a few hours.

      The skupper-site configmap:

       kind: ConfigMap
      apiVersion: v1
      metadata:
        name: skupper-site
      data:
        controller-pod-antiaffinity: app.kubernetes.io/name=skupper-router
        router-mode: interior
        routers: '2'
        name: ns-hip-cci-interconnect-interior-sandbox
        console-authentication: openshift
        flow-collector: 'true'
        flow-collector-record-ttl: 5m0
        service-sync: 'true'
        service-controller: 'true'
        ingress: none
        router-pod-antiaffinity: app.kubernetes.io/name=skupper-router
        console: 'true'
        console-ingress: route

      Workaround: Restart the skupper-service-controller, skupper-flow-collector, and skupper-router deployments after updating the configmaps.

              ansmith@redhat.com Andrew Smith
              rhn-support-shiggs Stephen Higgs
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: