-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.13.z
-
+
-
Critical
-
Yes
-
3
-
WINC - Sprint 238
-
1
-
False
-
-
Fixes an issue which would stop Windows Nodes from being deconfigured due to failing to remove the containerd log file. This has been fixed by properly stopping containerd before attempting to remove the log file.
-
Bug Fix
Description of problem:
Upgrading a BYOH node is failing, the BYOH node after upgrade remains in NotReady,SchedulingDisabled' {"level":"error","ts":"2023-06-07T16:52:46Z","msg":"Reconciler error","controller":"configmap","controllerGroup":"","controllerKind":"ConfigMap","ConfigMap":{"name":"windows-instances","namespace":"openshift-windows-machine-config-operator"},"namespace":"openshift-windows-machine-config-operator","name":"windows-instances","reconcileID":"8e6cc51d-9fd4-4e44-b39e-6b6d678c6422","error":"error configuring host with address 10.0.128.7: error deconfiguring instance: unable to remove created directories: unable to remove directory C:\\var\\log, out: Remove-Item : Cannot remove item C:\\var\\log\\containerd\\containerd.log: The process cannot access the file \r\n'containerd.log' because it is being used by another process.\r\nAt line:1 char:27\r\n+ if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n + CategoryInfo : WriteError: (containerd.log:FileInfo) [Remove-Item], IOException\r\n + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand\r\nRemove-Item : Cannot remove item C:\\var\\log\\containerd: The directory is not empty.\r\nAt line:1 char:27\r\n+ if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n + CategoryInfo : WriteError: (containerd:DirectoryInfo) [Remove-Item], IOException\r\n + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand\r\nRemove-Item : Cannot remove item C:\\var\\log: The directory is not empty.\r\nAt line:1 char:27\r\n+ if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n + CategoryInfo : WriteError: (C:\\var\\log:DirectoryInfo) [Remove-Item], IOException\r\n + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand\r\n, err: error running powershell.exe -NonInteractive -ExecutionPolicy Bypass \"if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\": Process exited with status 1","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
Version-Release number of selected component (if applicable):
Upgrading from: windows-services-7.0.1-bc9473b 2 6h13m To: windows-services-8.0.1-01a3618 2 53m
How reproducible:
Most likely, in AWS passed
Steps to Reproduce:
1. Install a BYOH node server 2022 on GCP (not via machineset) 2. perform upgrade from 7.0.1-bc9473b 4.12 to 4.13 8.0.1-01a3618 3. wait until the upgrade completed
Actual results:
In case machineset node get upgraded BYOH is stuck in NotReady,SchedulingDisabled
Expected results:
Nodes should be in Ready after upgrade with the correct kubelet version
Additional info:
oc get nodes -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME mgcp-byoh-0.c.openshift-qe.internal NotReady,SchedulingDisabled worker 4h25m v1.25.0-2653+a34b9e9499e6c3 10.0.128.7 <none> Windows Server 2022 Datacenter 10.0.20348.1726 containerd://1.19 rrasouli-397-x7hdb-master-0.c.openshift-qe.internal Ready control-plane,master 6h55m v1.26.5+7a891f0 10.0.0.3 <none> Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow) 5.14.0-284.16.1.el9_2.x86_64 cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9 rrasouli-397-x7hdb-master-1.c.openshift-qe.internal Ready control-plane,master 6h56m v1.26.5+7a891f0 10.0.0.5 <none> Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow) 5.14.0-284.16.1.el9_2.x86_64 cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9 rrasouli-397-x7hdb-master-2.c.openshift-qe.internal Ready control-plane,master 6h54m v1.26.5+7a891f0 10.0.0.4 <none> Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow) 5.14.0-284.16.1.el9_2.x86_64 cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9 rrasouli-397-x7hdb-worker-a-5872s.c.openshift-qe.internal Ready worker 6h44m v1.26.5+7a891f0 10.0.128.3 <none> Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow) 5.14.0-284.16.1.el9_2.x86_64 cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9 rrasouli-397-x7hdb-worker-b-fsc8d.c.openshift-qe.internal Ready worker 6h44m v1.26.5+7a891f0 10.0.128.2 <none> Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow) 5.14.0-284.16.1.el9_2.x86_64 cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
- blocks
-
OCPBUGS-15522 BYOH node failed to upgrade: Cannot remove item C:\\var\\log\\containerd\\containerd.log: The process cannot access the file \r\n'containerd.log' because it is being used by another process
- Closed
-
OCPBUGS-20083 BYOH node failed to upgrade: Cannot remove item C:\\var\\log\\containerd\\containerd.log: The process cannot access the file \r\n'containerd.log' because it is being used by another process
- Closed
- is blocked by
-
WINC-1058 Impact BYOH node failed to upgrade: Cannot remove item C:\\var\\log\\containerd\\containerd.log: The process cannot access the file \r\n'containerd.log' because it is being used by another process
- Closed
- is cloned by
-
OCPBUGS-15522 BYOH node failed to upgrade: Cannot remove item C:\\var\\log\\containerd\\containerd.log: The process cannot access the file \r\n'containerd.log' because it is being used by another process
- Closed
-
OCPBUGS-20083 BYOH node failed to upgrade: Cannot remove item C:\\var\\log\\containerd\\containerd.log: The process cannot access the file \r\n'containerd.log' because it is being used by another process
- Closed
- links to
- mentioned on