Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Undefined
Fix Version/s: 1.5.3
Affects Version/s: 1.1
Component/s: Control plane
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
SKUPPER-60
GSS Priority:
Target Release:

1.5.3
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

When a certificate/secret is updated in OCP, skupper loads the new certificate.

A user configured all the certificates using the certmanager operator. They chose to have each certificate to have a max duration of 1 hour and renew it every 15 minutes for testing purpose.

When starting up all the pods they saw that flow-collector container in the service-controller pod connects to the router pod.

After a couple of hours they noticed that a lot of error logs appeared in the flow collector pod:

2023/08/18 07:45:52 COLLECTOR: Error receiving message  remote error: tls: expired certificate
2023/08/18 07:45:52 COLLECTOR: Error receiving message  remote error: tls: expired certificate

The logs in the router container give:
{code:java}
08-18 07:07:12.918718 +0000 SERVER (info) [C2437370] Accepted connection to :5671 from 10.x.x.x:44156
2023-08-18 07:07:12.928722 +0000 SERVER (error) [C2437370] Connection from 10.x.x.x:44156 (to :5671) failed: amqp:connection:framing-error SSL Failure: error:0A000086:SSL routines::certificate verify failed
2023-08-18 07:07:12.930201 +0000 SERVER (info) [C2437371] Accepted connection to :5671 from 10.x.x.x:44172
2023-08-18 07:07:12.937756 +0000 SERVER (error) [C2437371] Connection from 10.x.x.x:44172 (to :5671) failed: amqp:connection:framing-error SSL Failure: error:0A000086:SSL routines::certificate verify failed by looking into both containers, they did see that when the certificates secrets were renewed by certmanager the new certificates were also present in both pods. So they think the pods do actually renew the secrets once they are changed.

When they restarted the service-controller pod the errors disappeared and the flow-collector pod seems to reconnect to the router pods fine, until the same thing with the errors occurs after a few hours.

The skupper-site configmap:

 kind: ConfigMap
apiVersion: v1
metadata:
  name: skupper-site
data:
  controller-pod-antiaffinity: app.kubernetes.io/name=skupper-router
  router-mode: interior
  routers: '2'
  name: ns-hip-cci-interconnect-interior-sandbox
  console-authentication: openshift
  flow-collector: 'true'
  flow-collector-record-ttl: 5m0
  service-sync: 'true'
  service-controller: 'true'
  ingress: none
  router-pod-antiaffinity: app.kubernetes.io/name=skupper-router
  console: 'true'
  console-ingress: route

Workaround: Restart the skupper-service-controller, skupper-flow-collector, and skupper-router deployments after updating the configmaps.

is duplicated by

SKUPPER-1133 Skupper control plane component only read certificate file at startup but not re-read them if the files change.

Closed

links to

RHBA-2023:124009 Red Hat Service Interconnect 1.5.3 Release (rpms)

Assignee:: Andrew Smith

Reporter:: Stephen Higgs

Votes:: 2 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/05/09 2:40 PM

Updated:: 2024/05/17 12:29 PM

Resolved:: 2024/04/18 7:00 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates