Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36799

ovnkube-node pod was CrashLoopBackOff on RHEL-8 workers

XMLWordPrintable

    • Important
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When adding RHEL-8 workers into 4.17 OCP clusters, the RHEL-8 workers can't be ready and the kubelet log on it shows errors about executing the ovnkube-lib/ovnkube-lib.sh

      Jul 10 01:42:38.920157 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:  > pod="openshift-ovn-kubernetes/ovnkube-node-8k8vx" podUID="a9eec9d8-e708-4397-b2d9-e6d5f17869d7" containerName="sbdb" containerID="cri-o://d16187493a25f0279dd0a09873285e2f1b312a7346c8c49b813f85f774489fc6"
      Jul 10 01:42:38.920157 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: I0710 01:42:38.918890    6409 kuberuntime_container.go:779] "Killing container with a grace period" pod="openshift-ovn-kubernetes/ovnkube-node-8k8vx" podUID="a9eec9d8-e708-4397-b2d9-e6d5f17869d7" containerName="sbdb" containerID="cri-o://d16187493a25f0279dd0a09873285e2f1b312a7346c8c49b813f85f774489fc6" gracePeriod=30
      Jul 10 01:42:38.922851 ip-10-0-69-5.ec2.internal crio[6376]: time="2024-07-10 01:42:38.921634812Z" level=info msg="Stopping container: d16187493a25f0279dd0a09873285e2f1b312a7346c8c49b813f85f774489fc6 (timeout: 30s)" id=591c98d3-6bf0-4770-8d01-c02bdab42864 name=/runtime.v1.RuntimeService/StopContainer
      Jul 10 01:42:38.946746 ip-10-0-69-5.ec2.internal conmon[13472]: conmon d16187493a25f0279dd0 <nwarn>: stdio_input read failed Resource temporarily unavailable
      Jul 10 01:42:38.946766 ip-10-0-69-5.ec2.internal conmon[13472]: conmon d16187493a25f0279dd0 <nwarn>: stdio_input read failed Resource temporarily unavailable
      Jul 10 01:42:38.953444 ip-10-0-69-5.ec2.internal systemd[1]: crio-conmon-d16187493a25f0279dd0a09873285e2f1b312a7346c8c49b813f85f774489fc6.scope: Succeeded.
      Jul 10 01:42:38.954264 ip-10-0-69-5.ec2.internal systemd[1]: crio-conmon-d16187493a25f0279dd0a09873285e2f1b312a7346c8c49b813f85f774489fc6.scope: Consumed 19ms CPU time
      Jul 10 01:42:39.052028 ip-10-0-69-5.ec2.internal crio[6376]: time="2024-07-10 01:42:39.051924604Z" level=warning msg="Failed to get pid for pod infra container: container PID not initialized"
      Jul 10 01:42:39.052141 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: I0710 01:42:39.052114    6409 logs.go:325] "Finished parsing log file" path="/var/log/pods/openshift-ovn-kubernetes_ovnkube-node-8k8vx_a9eec9d8-e708-4397-b2d9-e6d5f17869d7/ovnkube-controller/8.log"
      Jul 10 01:42:39.052760 ip-10-0-69-5.ec2.internal crio[6376]: time="2024-07-10 01:42:39.052732456Z" level=warning msg="Failed to get pid for pod infra container: container PID not initialized"
      Jul 10 01:42:39.052838 ip-10-0-69-5.ec2.internal crio[6376]: time="2024-07-10 01:42:39.052761628Z" level=info msg="Stopped container d16187493a25f0279dd0a09873285e2f1b312a7346c8c49b813f85f774489fc6: openshift-ovn-kubernetes/ovnkube-node-8k8vx/sbdb" id=591c98d3-6bf0-4770-8d01-c02bdab42864 name=/runtime.v1.RuntimeService/StopContainer
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: E0710 01:42:39.052964    6409 kuberuntime_manager.go:1256] container &Container{Name:sbdb,Image:registry.build03.ci.openshift.org/ci-op-z1lkh7hj/stable@sha256:9125754767b3c8bf4c970562a8d37ba910b03de06e5b1221bd59db4cab6ad267,Command:[/bin/bash -c set -xem
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: if [[ -f /env/_master ]]; then
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:   set -o allexport
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:   source /env/_master
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:   set +o allexport
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: fi
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: . /ovnkube-lib/ovnkube-lib.sh || exit 1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: trap quit-sbdb TERM INT
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: start-sbdb ${OVN_LOG_LEVEL}
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: ],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{EnvVar{Name:OVN_LOG_LEVEL,Value:info,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{cpu: {{10 -3} {<nil>} 10m DecimalSI},memory: {{314572800 0} {<nil>} 300Mi BinarySI},},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:ovnkube-script-lib,ReadOnly:false,MountPath:/ovnkube-lib,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},VolumeMount{Name:etc-openvswitch,ReadOnly:false,MountPath:/etc/ovn/,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},VolumeMount{Name:run-ovn,ReadOnly:false,MountPath:/run/ovn/,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},VolumeMount{Name:node-log,ReadOnly:false,MountPath:/var/log/ovn,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},VolumeMount{Name:env-overrides,ReadOnly:false,MountPath:/env,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},VolumeMount{Name:kube-api-access-qt6mf,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},},LivenessProbe:nil,ReadinessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:&ExecAction{Command:[/bin/bash -c set -xeo pipefail
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: . /ovnkube-lib/ovnkube-lib.sh || exit 1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: ovndb-readiness-probe "sb"
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: ],},HTTPGet:nil,TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:10,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,TerminationGracePeriodSeconds:nil,},Lifecycle:&Lifecycle{PostStart:&LifecycleHandler{Exec:&ExecAction{Command:[/bin/bash -c set -x
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: . /ovnkube-lib/ovnkube-lib.sh || exit 1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: sbdb-post-start
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: ],},HTTPGet:nil,TCPSocket:nil,Sleep:nil,},PreStop:nil,},TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:FallbackToLogsOnError,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod ovnkube-node-8k8vx_openshift-ovn-kubernetes(a9eec9d8-e708-4397-b2d9-e6d5f17869d7): PostStartHookError: Exec lifecycle hook ([/bin/bash -c set -x
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: . /ovnkube-lib/ovnkube-lib.sh || exit 1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: sbdb-post-start
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: ]) for Container "sbdb" in Pod "ovnkube-node-8k8vx_openshift-ovn-kubernetes(a9eec9d8-e708-4397-b2d9-e6d5f17869d7)" failed - error: rpc error: code = Unknown desc = command error: panic: opensslcrypto: can't initialize OpenSSL : openssl: can't retrieve OpenSSL version
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: goroutine 1 gp=0xc0000061c0 m=0 mp=0x557c46244160 [running]:
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: panic({0x557c45e58aa0?, 0xc00002ee10?})
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/panic.go:779 +0x158 fp=0xc000175ce8 sp=0xc000175c38 pc=0x557c4578c4b8
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: crypto/internal/backend.init.0()
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/crypto/internal/backend/openssl.go:50 +0x26c fp=0xc000175e20 sp=0xc000175ce8 pc=0x557c4599926c
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.doInit1(0x557c4622c290)
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/proc.go:7176 +0xea fp=0xc000175f50 sp=0xc000175e20 pc=0x557c4579e44a
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.doInit(...)
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/proc.go:7143
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.main()
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/proc.go:253 +0x357 fp=0xc000175fe0 sp=0xc000175f50 pc=0x557c4578fb97
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.goexit({})
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000175fe8 sp=0xc000175fe0 pc=0x557c457c2da1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/proc.go:402 +0xce fp=0xc00004afa8 sp=0xc00004af88 pc=0x557c4578ff0e
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.goparkunlock(...)
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/proc.go:408
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.forcegchelper()
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/proc.go:326 +0xb8 fp=0xc00004afe0 sp=0xc00004afa8 pc=0x557c4578fd98
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.goexit({})
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00004afe8 sp=0xc00004afe0 pc=0x557c457c2da1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: created by runtime.init.7 in goroutine 1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/proc.go:314 +0x1a
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/proc.go:402 +0xce fp=0xc00004b780 sp=0xc00004b760 pc=0x557c4578ff0e
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.goparkunlock(...)
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/proc.go:408
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.bgsweep(0xc00002a070)
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/mgcsweep.go:278 +0x94 fp=0xc00004b7c8 sp=0xc00004b780 pc=0x557c4577b4b4
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.gcenable.gowrap1()
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/mgc.go:203 +0x25 fp=0xc00004b7e0 sp=0xc00004b7c8 pc=0x557c4576fde5
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.goexit({})
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00004b7e8 sp=0xc00004b7e0 pc=0x557c457c2da1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: created by runtime.gcenable in goroutine 1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/mgc.go:203 +0x66
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.gopark(0xc00002a070?, 0x557c45c07658?, 0x1?, 0x0?, 0xc000007340?)
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/proc.go:402 +0xce fp=0xc00004bf78 sp=0xc00004bf58 pc=0x557c4578ff0e
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.goparkunlock(...)
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/proc.go:408
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.(*scavengerState).park(0x557c46243680)
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00004bfa8 sp=0xc00004bf78 pc=0x557c45778ea9
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.bgscavenge(0xc00002a070)
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc00004bfc8 sp=0xc00004bfa8 pc=0x557c4577943c
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.gcenable.gowrap2()
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/mgc.go:204 +0x25 fp=0xc00004bfe0 sp=0xc00004bfc8 pc=0x557c4576fd85
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.goexit({})
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00004bfe8 sp=0xc00004bfe0 pc=0x557c457c2da1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: created by runtime.gcenable in goroutine 1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/mgc.go:204 +0xa5
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.gopark(0xc00004a648?, 0x557c45763485?, 0xa8?, 0x1?, 0xc0000061c0?)
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/proc.go:402 +0xce fp=0xc00004a620 sp=0xc00004a600 pc=0x557c4578ff0e
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.runfinq()
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/mfinal.go:194 +0x107 fp=0xc00004a7e0 sp=0xc00004a620 pc=0x557c4576ee27
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: runtime.goexit({})
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00004a7e8 sp=0xc00004a7e0 pc=0x557c457c2da1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: created by runtime.createfing in goroutine 1
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]:         /usr/lib/golang/src/runtime/mfinal.go:164 +0x3d
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: time="2024-07-10T01:42:38Z" level=error msg="exec failed: unable to start container process: read init-p: connection reset by peer"
      Jul 10 01:42:39.053013 ip-10-0-69-5.ec2.internal kubenswrapper[6409]: , stdout: , stderr: , exit code -1, message: ""
       

       

      Example failure job: 

      https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_openshift-ansible/12503/pull-ci-openshift-openshift-ansible-master-e2e-aws-workers-rhel8/1810828773206003712

       

      The full journal from RHEL-8 worker:

      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_openshift-ansible/12503/pull-ci-openshift-openshift-ansible-master-e2e-aws-workers-rhel8/1810828773206003712/artifacts/e2e-aws-workers-rhel8/gather-extra/artifacts/nodes/ip-10-0-69-5.ec2.internal/

       

      Error messages from the ovnkube-controller container of ovnkube-node pods running on RHEL-8 worker:

      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_openshift-ansible/12503/pull-ci-openshift-openshift-ansible-master-e2e-aws-workers-rhel8/1810828773206003712/artifacts/e2e-aws-workers-rhel8/gather-extra/artifacts/pods/openshift-ovn-kubernetes_ovnkube-node-8k8vx_ovnkube-controller.log

       

      Version-Release number of selected component (if applicable):

      4.17.0-0.nightly-2024-07-09-121045 with RHEL-8.10 worker

       

      How reproducible:

      Always

       

      Steps to Reproduce:

      1. Adding RHEL-8 worker to 4.17 cluster

      2.

      3.

      Actual results:

      Expected results:

      Additional info:

      Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

      Affected Platforms:

      Is it an

      1. internal CI failure
      2. customer issue / SD
      3. internal RedHat testing failure

      If it is an internal RedHat testing failure:

      • Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

      If it is a CI failure:

      • Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
      • Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
      • Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
      • When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
      • If it's a connectivity issue,
      • What is the srcNode, srcIP and srcNamespace and srcPodName?
      • What is the dstNode, dstIP and dstNamespace and dstPodName?
      • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

      If it is a customer / SD issue:

      • Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
      • Don’t presume that Engineering has access to Salesforce.
      • Do presume that Engineering will access attachments through supportshell.
      • Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
      • Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
        • If the issue is in a customer namespace then provide a namespace inspect.
        • If it is a connectivity issue:
          • What is the srcNode, srcNamespace, srcPodName and srcPodIP?
          • What is the dstNode, dstNamespace, dstPodName and dstPodIP?
          • What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
          • Please provide the UTC timestamp networking outage window from must-gather
          • Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
        • If it is not a connectivity issue:
          • Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.
      • When showing the results from commands, include the entire command in the output.  
      • For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
      • For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
      • Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
      • Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”
      • For guidance on using this template please see
        OCPBUGS Template Training for Networking  components

              jcaamano@redhat.com Jaime Caamaño Ruiz
              rh-ee-gpei Gaoyun Pei
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: