-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
None
-
5
-
None
-
None
-
None
Background
Test coverage for OCPBUGS-27300 which was fixed in 4.15.0. The bug prevented proper node draining during WMCO upgrades when pods had emptydir volumes attached.
Original Issue (Fixed):
OCPBUGS-27300: Node drain does not work correctly with local-data pods (Closed - Done-Errata)- Fix Version: 4.15.0
- Customer Impact: Ford Motor Company, Aareal Bank AG
- Related: OCPQE-18994 (QE test coverage task)
Problem:
During WMCO upgrades, nodes did not correctly drain pods with emptydir/local storage. WMCO was missing the DeleteEmptyDirData field in the node drain helper struct, causing:
- Error: "cannot delete Pods with local storage"
- All nodes cordoned simultaneously instead of rolling upgrade
- Upgrade failures requiring manual intervention
Fix:
Added DeleteEmptyDirData: true to node drain helper struct in WMCO 4.15.0+
Test Objective
Validate that Windows nodes can be drained successfully when running workloads with emptydir volumes, and that draining happens in a controlled rolling fashion (not all nodes at once).
Test Design
Test Case: OCP-XXXXX - Verify node drain handles emptydir volumes during WMCO operations
Workload Setup
Deploy Windows workload with emptydir volumes:
apiVersion: apps/v1 kind: Deployment metadata: name: windows-emptydir-test spec: replicas: 3 # Spread across multiple Windows nodes selector: matchLabels: app: emptydir-test template: spec: nodeSelector: kubernetes.io/os: windows containers: **** name: iis image: mcr.microsoft.com/windows/servercore/iis:windowsservercore volumeMounts: ***** name: temp-storage mountPath: C:\\temp command: ***** powershell ***** -Command ***** | # Write data to emptydir to simulate local storage while ($true) { Get-Date | Out-File C:\\temp\\timestamp.txt -Append Start-Sleep -Seconds 10 } volumes: **** name: temp-storage emptyDir: {}
Test Steps
- Deploy workload with emptydir volumes
- Create deployment with 3+ replicas across Windows nodes
- Verify pods are running and writing to emptydir
- Confirm pods distributed across multiple nodes
- Trigger node drain scenario (primary: Option A)
- Option A (Preferred): Trigger WMCO upgrade by changing operator version
- Option B (Alternative): Manually cordon and drain a Windows node
- Option C (Alternative): Patch Windows node annotation to trigger WMCO reconciliation
- Monitor drain behavior:
- Watch for drain errors in WMCO logs: "cannot delete Pods with local storage"
- Monitor node drain sequence to ensure rolling drain (only one node at a time)
- Watch pod evictions and rescheduling
- Validate drain behavior:
- NO errors: "cannot delete Pods with local storage"
- Pods with emptydir volumes are evicted successfully
- Only ONE node drains at a time (rolling drain, not all nodes cordoned)
- Pods are rescheduled to other available nodes
- Workload maintains minimum availability during drain
- Verify completion:
- All Windows nodes completed drain/upgrade
- All workload pods running correctly
- No pods stuck in pending/evicting state
Key Assertions
g.By("Verify no 'cannot delete Pods with local storage' errors in WMCO logs") wmcoLogs := getWMCOLogs(oc, startTime) o.Expect(wmcoLogs).NotTo(o.ContainSubstring("cannot delete Pods with local storage")) g.By("Verify only one node draining at a time (not all nodes cordoned)") cordonedNodes := getCordonedWindowsNodes(oc) o.Expect(len(cordonedNodes)).To(o.BeNumerically("<=", 1), "Multiple nodes cordoned simultaneously - not rolling upgrade") g.By("Verify pods with emptydir were successfully evicted and rescheduled") pods := getPodsWithLabel(oc, "app=emptydir-test") o.Expect(len(pods)).To(o.Equal(3), "Expected 3 replicas running") for _, pod := range pods { o.Expect(pod.Status.Phase).To(o.Equal("Running")) }
Test Variations (Optional)
Variation 1: During WMCO Upgrade
- Trigger actual WMCO version upgrade
- Validates full upgrade path
Variation 2: Manual Node Drain
- Use oc adm drain on Windows node
- Validates drain helper works correctly
Variation 3: Multiple Workload Types
- StatefulSet with emptydir
- DaemonSet with emptydir
- Multiple emptydir volumes per pod
Why This Test Matters
- Regression Prevention: Ensures the fix for
OCPBUGS-27300doesn't regress - Customer Impact: Ford Motor Company and other customers hit this issue
- Critical Path: Node drain is essential for upgrades and maintenance
- Real-world Scenario: Many Windows workloads use emptydir for temp files, caching, etc.
Implementation Notes
Helper Functions Needed
- getWMCOLogs(oc, startTime) - Fetch WMCO logs since start time
- getCordonedWindowsNodes(oc) - List cordoned Windows nodes
- getPodsWithLabel(oc, label) - Get pods by label selector
- triggerWMCOUpgrade(oc, version) - Trigger WMCO upgrade (if using Option A)
Platforms to Test
- AWS IPI
- Azure IPI
- GCP IPI
- vSphere (optional)
Acceptance Criteria
- [ ] Create Polarion test case (OCP-XXXXX)
- [ ] Implement test automation in test/extended/winc/winc.go
- [ ] Add helper functions to test/extended/winc/utils.go if needed
- [ ] Test validates all key assertions
- [ ] Test covers rolling drain behavior
- [ ] CI passes on AWS, Azure, GCP
- [ ] Test merged to master
- [ ] Polarion test case marked as automated
- [ ] OCPQE-18994 marked as complete
Related Issues
OCPBUGS-27300: Node drain does not work correctly with local-data pods (Closed - Fixed in 4.15.0)- OCPQE-18994: QE test coverage task (To Do)
OCPBUGS-22711: Backport to 4.14 (Closed)- OCPBUGS-18334: Duplicate issue (Closed)
- relates to
-
OCPBUGS-27300 Node drain does not work correctly with local-data pods
-
- Closed
-