-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.21
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Summary:
The SR-IOV operator creates NetworkAttachmentDefinition (NAD) resources with *incomplete configuration*. When user pods try to attach to SR-IOV networks, the CNI plugin fails because critical fields (`resourceName` and `pciAddress`) are missing from the NAD's spec.config JSON (though `resourceName` is correctly placed in metadata.annotations).
*Result*: Pods remain in Pending state with error: "SRIOV-CNI failed to load netconf: LoadConf(): VF pci addr is required"
This bug was discovered during comprehensive integration testing of the SR-IOV operator and manifests as pod networking attachment failures.
Description of problem:
### What Goes Wrong```
Expected NAD spec.config:
{
"resourceName": "openshift.io/cx7anl244", ✅ CNI needs this
"pciAddress": "0000:02:01.2", ✅ CNI needs this
"type": "sriov"
}Actual NAD spec.config:
{
"type": "sriov"
# ❌ resourceName MISSING!
# ❌ pciAddress MISSING!
}BUT: resourceName IS in metadata.annotations ✅
```
### Impact- ❌ Pod attachment fails
- ❌ All SR-IOV networking broken
- ❌ Tests timeout waiting for pod readiness
- ✅ Only manifests when creating NEW networks or after operator restart---
## When Bug Manifests
**NOT in normal operation** (pre-configured networks work fine)
**YES in these situations:**
1. Creating a NEW SriovNetwork resource
2. After operator restart/reinstallation
3. Comprehensive testing (like your tests!)
4. When NADs are regenerated
**Why**: Operator only creates NAD when you create SriovNetwork resource
---
## Root Cause
**Location**: `bindata/manifests/cni-config/sriov/` (template files)
**Issue**: Template placement logic puts `resourceName` in annotations but NOT in spec.config JSON
**Go Code**: ✅ CORRECT - properly prepares `data.Data["CniResourceName"]`
**Templates**: ❌ BUGGY - uses it in annotations but not in CNI config
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
### Auto-Reproduction ToolFile: `reproduce_incomplete_nad_bug.sh` **What it does:** 1. Creates test namespace 2. Creates SriovNetwork resource 3. Captures operator logs 4. Attempts to create test pod 5. Collects all evidence 6. Shows complete NAD output **How to run:** ```bash bash reproduce_incomplete_nad_bug.sh# Output: Complete NAD config in /tmp/ ```
Actual results:
### Scenario 1: Pre-configured Networks ``` Production Setup: Networks created long ago or pre-provided ↓ Operator just uses them ↓ ✅ No new NAD creation needed ↓ ✅ Bug doesn't manifest ``` ### Scenario 2: New Network Creation (Your Tests) ``` Your Test Setup: Create FRESH SriovNetwork resource ↓ ❌ Operator generates NEW NAD with incomplete config ↓ Try to attach pods ↓ ❌ CNI plugin fails ``` ### Scenario 3: Operator Restart ``` Production Scenario: Operator running (NADs exist) ↓ Operator crashes/restarts ↓ NADs regenerated ↓ ❌ Regenerated NAD has incomplete config ↓ ❌ Pods fail to attach ↓ This is when bug appears in production
Expected results:
pods attach ok
Additional info: