Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-6450

Submariner connectivity is lost when the cluster is alive for a longer duration

XMLWordPrintable

    • False
    • None
    • False
    • No

      Description of problem:

      On RDR Longevity cluster(have been running for past one month) observed submariner connectivity loss.  No operations have been performed on the cluster, only IOs were running.

      Installed submariner using catsrc on CLI

      Version-Release number of selected component (if applicable):

      ACM - 2.8

      Submariner - v0.15.0

      OCP - 4.13.0-0.nightly-2023-06-05-164816

      ODF - 4.13.0-219.snaptrim

      How reproducible:

      Steps to Reproduce:

      1.  On RDR setup, keep the cluster with IOs running for several weeks.
      2.   Ensure submariner connectivity is fine during setup
      3.   After few weeks, submariner connectivity is lost(Attached UI screenshot)        

      Actual results:

      Expected results:

      Submariner connectivity should remain intact

      Additional info:

      subctly verify fails with this

      Summarizing 9 Failures:
      [FAIL] [dataplane] Gateway status reporting when a gateway node is configured [It] should correctly report its status and connection information
      /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/framework.go:561
      [FAIL] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is not on a gateway and the remote pod is not on a gateway [It] should have sent the expected data from the pod to the other pod
      /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:195
      [FAIL] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is not on a gateway and the remote pod is on a gateway [It] should have sent the expected data from the pod to the other pod
      /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:195
      [FAIL] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is on a gateway and the remote pod is not on a gateway [It] should have sent the expected data from the pod to the other pod
      /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:195
      [FAIL] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is on a gateway and the remote pod is on a gateway [It] should have sent the expected data from the pod to the other pod
      /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:195
      [FAIL] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent the expected data from the pod to the other pod
      /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:195
      [FAIL] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is not on a gateway and the remote service is on a gateway [It] should have sent the expected data from the pod to the other pod
      /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:195
      [FAIL] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is on a gateway and the remote service is not on a gateway [It] should have sent the expected data from the pod to the other pod
      /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:195
      [TIMEDOUT] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is on a gateway and the remote service is on a gateway [It] should have sent the expected data from the pod to the other pod
      /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:36

      Ran 9 of 45 Specs in 3600.776 seconds
      FAIL! - Suite Timeout Elapsed – 0 Passed | 9 Failed | 0 Pending | 36 Skipped
       

      Log snippets of submariner gateway pods
      E0719 19:41:48.794445 1 queue.go:106] local -> broker for *v1.Endpoint: Failed to process object with key "submariner-operator/kmanohar-clu2-submariner-cable-kmanohar-clu2-10-1-114-115": error distributing resource "submariner-operator/kmanohar-clu2-submariner-cable-kmanohar-clu2-10-1-114-115": error creating or updating resource: error retrieving "kmanohar-clu2-submariner-cable-kmanohar-clu2-10-1-114-115": Get "https://api.kmanohar-h.qe.rh-ocs.com:6443/apis/submariner.io/v1/namespaces/submariner-broker/endpoints/kmanohar-clu2-submariner-cable-kmanohar-clu2-10-1-114-115
      ": tls: failed to verify certificate: x509: certificate signed by unknown authority
       

      subctl gather logs - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/keerthana/submariner/

       

            skitt@redhat.com Stephen Kitt
            kmanohar@redhat.com Keerthana Manoharan
            Maxim Babushkin Maxim Babushkin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: