Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59905

egress connection intermittently failing in the RHOCP 4.18

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Critical Critical
    • None
    • 4.18.z
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • CORENET Sprint 275
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Version-Release number of selected component (if applicable):

      How reproducible:

      Create the egress connection and do the ovn-trace on it. 

      Steps to Reproduce:

      1. Create the EIP 

      2. Attach it to the namespace 

      3. Check the ovn-trace on the destination node. 

      Actual results: Egress ip connection fails intermittently at l2 lookup 

      Expected results: Egress connection should not fail 

      Additional info:

      Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

      Affected Platforms: RHOCP 4. 

      Is it an

      1. customer issue 

      If it is a customer / SD issue:

      • Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
      • Don’t presume that Engineering has access to Salesforce.
      • Do presume that Engineering will access attachments through supportshell.
      • Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
      • Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
        • If the issue is in a customer namespace then provide a namespace inspect.
        • If it is a connectivity issue:
          • What is the srcNode, coecpwtsgp0e06.sg.uobnet.com  srcNamespace egp-sit-01, srcPodName - deploypod-zpq0v-g4lhp  and srcPodIP 10.165.14.76 ?
          • What is the dstip 172.29.227.90 Egress ip 10.85.93.34 
          • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)  POD-- EGRESS IP –- External server 
          • Please provide the UTC timestamp networking outage window from must-gather 
          • Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs 

      ovn-trace --ct new transit_switch 'inport=="tstor-coecpwtsgp0e06.sg.uobnet.com" && eth.src==0a:58:64:58:00:10 && ip4.src==10.165.14.76 && ip4.dst==172.29.227.90 && eth.dst==0a:58:64:58:00:09 && tcp && tcp.dst==9000 && ip.ttl==64' 

      ct_snat(ip4.src=10.85.93.34)
      ----------------------------
       6. lr_out_delivery (northd.c:14861): outport == "rtoe-GR_coecpwtsgp0e01.sg.uobnet.com", priority 100, uuid afb2fb63
          output;
          /* output to "rtoe-GR_coecpwtsgp0e01.sg.uobnet.com", type "l3gateway" */ingress(dp="ext_coecpwtsgp0e01.sg.uobnet.com", inport="etor-GR_coecpwtsgp0e01.sg.uobnet.com")
      ---------------------------------------------------------------------------------------------
       0. ls_in_check_port_sec (northd.c:5932): inport == "etor-GR_coecpwtsgp0e01.sg.uobnet.com", priority 70, uuid 44d0dff0
          reg0[18] = 1;
          next;
       5. ls_in_pre_lb (northd.c:6037): ip && inport == "etor-GR_coecpwtsgp0e01.sg.uobnet.com", priority 110, uuid c90f7ec0
          next;
      28. ls_in_l2_lkup: no match (implicit drop) 

       

            • If it is not a connectivity issue:
              • Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.
      • When showing the results from commands, include the entire command in the output.  
      • For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
      • For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
      • Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
      • Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”
      • For guidance on using this template please see
        OCPBUGS Template Training for Networking  components

              mkennell@redhat.com Martin Kennelly (Inactive)
              rhn-support-soujain Sourav Jain
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: