Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17480

Precaching leaves the /var/lib/kubelet volume behind after job is done. Subsequent backup fails on copying the hard link

XMLWordPrintable

    • Yes
    • CNF RAN Sprint 240, CNF RAN Sprint 241, CNF RAN Sprint 242, CNF RAN Sprint 243, CNF RAN Sprint 244
    • 5
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      After precaching completed:
      
      # mount|grep pre-cache  
      /dev/nvme0n1p4 on /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/8 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
      /dev/nvme0n1p4 on /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/9 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
      /dev/nvme0n1p4 on /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
      /dev/nvme0n1p4 on /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/11 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
      tmpfs on /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10/pods/4535934a-e318-4c09-8127-68288325469d/volumes/kubernetes.io~projected/kube-api-access-j8bvf type tmpfs (rw,relatime,seclabel,size=129827608k,inode64)
      tmpfs on /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10/pods/af7f8d4d-f945-4531-af19-52b8e9e79892/volumes/kubernetes.io~projected/kube-api-access-t2nn8 type tmpfs (rw,relatime,seclabel,size=129827608k,inode64)
      /dev/nvme0n1p4 on /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/11 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
      
      In kubelet log:
      
      Aug 08 17:51:59 cnfdf22.telco5gran.eng.rdu2.redhat.com bash[25923]: I0808 17:51:59.711233   25923 reconciler_common.go:169] "operationExecutor.UnmountVolume started for volume \"host-var\" (UniqueName: \"kubernetes.io/host-path/e9a04cab-4b87-4a7e-92c5-a29d8c709374-host-var\") pod \"e9a04cab-4b87-4a7e-92c5-a29d8c709374\" (UID: \"e9a04cab-4b87-4a7e-92c5-a29d8c709374\") "
      Aug 08 17:51:59 cnfdf22.telco5gran.eng.rdu2.redhat.com bash[25923]: E0808 17:51:59.712398   25923 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/host-path/e9a04cab-4b87-4a7e-92c5-a29d8c709374-host-var podName:e9a04cab-4b87-4a7e-92c5-a29d8c709374 nodeName:}" failed. No retries permitted until 2023-08-08 17:54:01.712383509 +0000 UTC m=+330268.786497522 (durationBeforeRetry 2m2s). Error: error cleaning subPath mounts for volume "host-var" (UniqueName: "kubernetes.io/host-path/e9a04cab-4b87-4a7e-92c5-a29d8c709374-host-var") pod "e9a04cab-4b87-4a7e-92c5-a29d8c709374" (UID: "e9a04cab-4b87-4a7e-92c5-a29d8c709374") : error processing /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container: error cleaning subpath mount /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10: unmount failed: exit status 32
      Aug 08 17:51:59 cnfdf22.telco5gran.eng.rdu2.redhat.com bash[25923]: Unmounting arguments: /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10
      Aug 08 17:51:59 cnfdf22.telco5gran.eng.rdu2.redhat.com bash[25923]: Output: umount: /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10: target is busy.
      
      Subsequent backup failed with:
      
      INFO[0002] snapshot db and kube resources are successfully saved to /var/recovery/clusterINFO[0002] Command succeeded: cp -Ra /etc/ /var/recovery/etc/INFO[0002] Command succeeded: cp -Ra /usr/local/ /var/recovery/usrlocal/INFO[0008]
       cp: will not create hard link 
      '/var/recovery/kubelet/pods/1c0a3ca3-b3bc-4765-a21f-d36e1f03fab0/volume-subpaths/host-var/pre-cache-container/10'
       to directory '/var/recovery/kubelet/'INFO[0012]
       cp: will not create hard link 
      '/var/recovery/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10'
       to directory '/var/recovery/kubelet/'INFO[0012] ##### Tue Aug  8 18:02:11 UTC 2023: Command failed: cp -Ra /var/lib/kubelet/ /var/recovery/kubelet/INFO[0012] Retrying after 1 seconds, attempt #2INFO[0018]
       cp: will not create hard link 
      '/var/recovery/kubelet/kubelet/pods/1c0a3ca3-b3bc-4765-a21f-d36e1f03fab0/volume-subpaths/host-var/pre-cache-container/10'
       to directory '/var/recovery/kubelet/kubelet'INFO[0023]
       cp: will not create hard link 
      '/var/recovery/kubelet/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10'
       to directory '/var/recovery/kubelet/kubelet'INFO[0023] ##### Tue Aug  8 18:02:22 UTC 2023: Command failed: cp -Ra /var/lib/kubelet/ /var/recovery/kubelet/INFO[0023] Retrying after 1 seconds, attempt #3INFO[0030]
       cp: will not create hard link 
      '/var/recovery/kubelet/kubelet/pods/1c0a3ca3-b3bc-4765-a21f-d36e1f03fab0/volume-subpaths/host-var/pre-cache-container/10'
       to directory '/var/recovery/kubelet/kubelet'INFO[0036]
       cp: will not create hard link 
      '/var/recovery/kubelet/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10'
       to directory '/var/recovery/kubelet/kubelet'INFO[0036] ##### Tue Aug  8 18:02:34 UTC 2023: Command failed: cp -Ra /var/lib/kubelet/ /var/recovery/kubelet/INFO[0036] ##### Tue Aug  8 18:02:34 UTC 2023: Failed to backup /var/lib/kubeletERRO[0036] exit status 1

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create a CGU with precaching on
      2. After precaching completed, enable a CGU (can be the same one) with backup on 
      
      

      Actual results:

      Backup fails, '/var/lib/kubelet/pods/<pod uid>/volume-subparths/host-var/pre-cache-container/10' is left behind

      Expected results:

      Backup succeeds and no volumes left behind from the precaching pod

      Additional info:

      Workaround is to manually umount volumes and cleanup /var/lib/kubelet/pods/<pod uid> on the node. Not practical with large number of spokes.
      
      sh-5.1# mount|grep pre-cache|awk '{print "umount " $3}'|sh
      umount: /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10: target is busy.
      umount: /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10/pods/1c0a3ca3-b3bc-4765-a21f-d36e1f03fab0/volume-subpaths/host-var/pre-cache-container/10: target is busy.
      umount: /var/lib/kubelet/pods/1c0a3ca3-b3bc-4765-a21f-d36e1f03fab0/volume-subpaths/host-var/pre-cache-container/10: target is busy.
      sh-5.1# mount|grep pre-cache
      /dev/nvme0n1p4 on /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
      /dev/nvme0n1p4 on /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10/pods/1c0a3ca3-b3bc-4765-a21f-d36e1f03fab0/volume-subpaths/host-var/pre-cache-container/10 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
      /dev/nvme0n1p4 on /var/lib/kubelet/pods/1c0a3ca3-b3bc-4765-a21f-d36e1f03fab0/volume-subpaths/host-var/pre-cache-container/10 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
      sh-5.1# mount|grep pre-cache|awk '{print "umount " $3}'|sh
      umount: /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10: target is busy.
      sh-5.1# mount|grep pre-cache
      /dev/nvme0n1p4 on /var/lib/kubelet/pods/e9a04cab-4b87-4a7e-92c5-a29d8c709374/volume-subpaths/host-var/pre-cache-container/10 type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
      sh-5.1# mount|grep pre-cache|awk '{print "umount " $3}'|sh
      sh-5.1# mount|grep pre-cache
      sh-5.1#

       

       

            jche@redhat.com Jun Chen
            jche@redhat.com Jun Chen
            Joshua Clark Joshua Clark
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: