Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15522

BYOH node failed to upgrade: Cannot remove item C:\\var\\log\\containerd\\containerd.log: The process cannot access the file \r\n'containerd.log' because it is being used by another process

XMLWordPrintable

    • Critical
    • No
    • 0
    • WINC - Sprint 238, WINC - Sprint 239
    • 2
    • False
    • Hide

      None

      Show
      None
    • Fixes an issue which would stop Windows Nodes from being deconfigured due to failing to remove the containerd log file. This has been fixed by properly stopping containerd before attempting to remove the log file.
    • Bug Fix

      This is a clone of issue OCPBUGS-14700. The following is the description of the original issue:

      Description of problem:

      Upgrading a BYOH node is failing, the BYOH node after upgrade remains in NotReady,SchedulingDisabled'
      
      {"level":"error","ts":"2023-06-07T16:52:46Z","msg":"Reconciler error","controller":"configmap","controllerGroup":"","controllerKind":"ConfigMap","ConfigMap":{"name":"windows-instances","namespace":"openshift-windows-machine-config-operator"},"namespace":"openshift-windows-machine-config-operator","name":"windows-instances","reconcileID":"8e6cc51d-9fd4-4e44-b39e-6b6d678c6422","error":"error configuring host with address 10.0.128.7: error deconfiguring instance: unable to remove created directories: unable to remove directory C:\\var\\log, out: Remove-Item : Cannot remove item C:\\var\\log\\containerd\\containerd.log: The process cannot access the file \r\n'containerd.log' because it is being used by another process.\r\nAt line:1 char:27\r\n+ if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\r\n+                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : WriteError: (containerd.log:FileInfo) [Remove-Item], IOException\r\n    + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand\r\nRemove-Item : Cannot remove item C:\\var\\log\\containerd: The directory is not empty.\r\nAt line:1 char:27\r\n+ if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\r\n+                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : WriteError: (containerd:DirectoryInfo) [Remove-Item], IOException\r\n    + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand\r\nRemove-Item : Cannot remove item C:\\var\\log: The directory is not empty.\r\nAt line:1 char:27\r\n+ if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\r\n+                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : WriteError: (C:\\var\\log:DirectoryInfo) [Remove-Item], IOException\r\n    + FullyQualifiedErrorId : RemoveFileSystemItemIOError,Microsoft.PowerShell.Commands.RemoveItemCommand\r\n, err: error running powershell.exe -NonInteractive -ExecutionPolicy Bypass \"if(Test-Path C:\\var\\log) {Remove-Item -Recurse -Force C:\\var\\log}\": Process exited with status 1","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}

      Version-Release number of selected component (if applicable):

      Upgrading from:
      windows-services-7.0.1-bc9473b         2      6h13m
      To:
      windows-services-8.0.1-01a3618         2      53m

      How reproducible:

      Most likely, in AWS passed

      Steps to Reproduce:

      1. Install a BYOH node server 2022 on GCP (not via machineset)
      2. perform upgrade from 7.0.1-bc9473b 4.12 to 4.13 8.0.1-01a3618
      3. wait until the upgrade completed
      

      Actual results:

      In case machineset node get upgraded BYOH is stuck in NotReady,SchedulingDisabled

      Expected results:

      Nodes should be in Ready after upgrade with the correct kubelet version

      Additional info:

       oc get nodes -owide
      NAME                                                        STATUS                        ROLES                  AGE     VERSION                       INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
      mgcp-byoh-0.c.openshift-qe.internal                         NotReady,SchedulingDisabled   worker                 4h25m   v1.25.0-2653+a34b9e9499e6c3   10.0.128.7    <none>        Windows Server 2022 Datacenter                                 10.0.20348.1726                containerd://1.19
      rrasouli-397-x7hdb-master-0.c.openshift-qe.internal         Ready                         control-plane,master   6h55m   v1.26.5+7a891f0               10.0.0.3      <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
      rrasouli-397-x7hdb-master-1.c.openshift-qe.internal         Ready                         control-plane,master   6h56m   v1.26.5+7a891f0               10.0.0.5      <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
      rrasouli-397-x7hdb-master-2.c.openshift-qe.internal         Ready                         control-plane,master   6h54m   v1.26.5+7a891f0               10.0.0.4      <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
      rrasouli-397-x7hdb-worker-a-5872s.c.openshift-qe.internal   Ready                         worker                 6h44m   v1.26.5+7a891f0               10.0.128.3    <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9
      rrasouli-397-x7hdb-worker-b-fsc8d.c.openshift-qe.internal   Ready                         worker                 6h44m   v1.26.5+7a891f0               10.0.128.2    <none>        Red Hat Enterprise Linux CoreOS 413.92.202306010245-0 (Plow)   5.14.0-284.16.1.el9_2.x86_64   cri-o://1.26.3-8.rhaos4.13.gitec064c9.el9

              rh-ee-ssoto Sebastian Soto
              openshift-crt-jira-prow OpenShift Prow Bot
              Aharon Rasouli Aharon Rasouli
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: