Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-87569

frr-k8s CI started failing using latest rpm, failures around BFD sessions

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • rhel-9.4.z
    • rhel-9.4.z
    • frr
    • frr-8.5.3-4.el9_4.5
    • Yes
    • Important
    • rhel-se-cs-infra-services
    • ssg_core_services
    • 3
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      What were you trying to do that didn't work?

      We have a CI lane running frr-k8s tests periodically for 4.19 and 4.18. about a week ago they started failing out of nowhere (no code changes happened in frr-k8s or metallb), specifically around not being able to establish BGP sessions with BFD enabled. I can confirm this happened due to the changes in the frr rpm between frr-8.5.3-4.el9.x86_64 and frr-8.5.3-4.el9_4.2.x86_64 (the tests pass consistently using the former, not with the latter) by running CI with a custom image using the previous rpm. 

      an example for a failed lane from yesterday: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-frr-release-4.18-periodics-frrk8s-e2e-metal-cno-periodic/1911965663459545088

      relevant logs being in https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-frr-release-4.18-periodics-frrk8s-e2e-metal-cno-periodic/1911965663459545088/artifacts/frrk8s-e2e-metal-cno-periodic/baremetalds-metallb-e2e-test/artifacts/artifacts/ 

       

      I can point to more logs if necessary, as it happened consistently over the past week.

      the way I confirmed it is due to the rpm is this pr: https://github.com/openshift/frr/pull/82

      where I manually edited the dockerfile to point to the previous rpm, and the lane passed.

       

      I could not reproduce this locally on my machine, haven't tried with an OCP cluster yet. while this did not come from QE or users, this is pretty concerning since this FRR rpm is used from at least 4.16.

      thanks

              rhn-support-bjmason Bryan Mason
              obraunsh@redhat.com Ori Braunshtein
              Michal Ruprich Michal Ruprich
              Therese Cornell Therese Cornell
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: