Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-39427

PTP events loses connectivity between producer and consumer when external interface is lost

XMLWordPrintable

    • No
    • CNF RAN Sprint 258, CNF RAN Sprint 259, CNF RAN Sprint 260
    • 3
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required
    • Done
    • 9/19: QE needs to verify this issue.

      This is a clone of issue OCPBUGS-31011. The following is the description of the original issue:

      Description of problem:

       PTP events loses connectivity between producer and consumer when external interface is lost.

      When  SNO OAM interface is used for PTP communication with the GM. If that link is pulled/restored it can be seen that the events which were attempted to be sent by the server side-car in the PTP daemon pod (associated with the link pull) fail to be delivered as long as the link remains down. Once the link is restored, events (associated with the link restore) are delivered fine. Given the server/client side-cars are on the same node, I would have expected this to be node internal communication vs. being tied to the OAM interface (and not to drop any events to the client).

       When the OAM link is down (previous test), there is a disconcerting message that shows up in the server side-car log: "not responding, waiting 6 times before marking to delete subscriber". This kind of implies that a communication error will cause the subscription to be removed? For this case, how would the client know its subscription was removed, and will no longer get events? But, even though that message comes out... it doesn't look like it removed the subscription (nor is there anything about a retry in the log either).

       

        The SNO OAM interface is the physical nic link that is connected to the management network they are using.  The management/OAM network connects to their management system back in the central office.

      How reproducible:

          

      Steps to Reproduce:

          1.Get management interface down; which pauses the network 
          2.Check no events are received 
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

            aputtur@redhat.com Aneesh Puttur
            openshift-crt-jira-prow OpenShift Prow Bot
            Bonnie Block Bonnie Block
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: