Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29751

day-0 with PerformanceProfile manifest renderer uses invalid uid

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • 4.16
    • Node Tuning Operator
    • None
    • No
    • CNF Compute Sprint 250
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the Performance Profile Creator (PPC) incorrectly populated the `metadata.ownerReferences.uid` field for Day 0 performance profile manifests. As a consequence, it was not possible to apply a performance profile at Day 0 without manual intervention. With this release, the PPC does not generate the `metadata.ownerReferences.uid` field for Day 0 manifests. As a result, you can apply a performance profile manifest at Day 0 as expected. (link:https://issues.redhat.com/browse/OCPBUGS-29751[*OCPBUGS-29751*])
      Show
      * Previously, the Performance Profile Creator (PPC) incorrectly populated the `metadata.ownerReferences.uid` field for Day 0 performance profile manifests. As a consequence, it was not possible to apply a performance profile at Day 0 without manual intervention. With this release, the PPC does not generate the `metadata.ownerReferences.uid` field for Day 0 manifests. As a result, you can apply a performance profile manifest at Day 0 as expected. (link: https://issues.redhat.com/browse/OCPBUGS-29751 [* OCPBUGS-29751 *])
    • Bug Fix
    • Done

      Description of problem:

      Picked up 4.14-ec-4 (which uses cgroups v1 as default) and trying to create a cluster with following PerformanceProfile (and corresponding mcp) by placing them in the manifests folder, 

       
      apiVersion: performance.openshift.io/v2
      kind: PerformanceProfile
      metadata:
        name: clusterbotpp
      spec:
        cpu:
          isolated: "1-3"
          reserved: "0"
        realTimeKernel:
          enabled: false
        nodeSelector:
          node-role.kubernetes.io/worker: ""
        machineConfigPoolSelector:
          pools.operator.machineconfiguration.openshift.io/worker: ""
      

      and, 

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfigPool
      metadata:
        name: worker 
      spec:
        machineConfigSelector:
          matchLabels:
            machineconfiguration.openshift.io/role: worker
        nodeSelector:
          matchLabels:
            node-role.kubernetes.io/worker: ""

      The cluster often fails to install because bootkube spends a lot of time chasing this error, 

       
      Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Created "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n
      Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Failed to update status for the "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n : Operation cannot be fulfilled on kubeletconfigs.machineconfiguration.openshift.io "performance-clusterbotpp": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machineconfiguration.openshift.io/kubeletconfigs/performance-clusterbotpp, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 11f98d74-af1b-4a4c-9692-6dce56ee5cd9, UID in object meta:
      Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: [#1717] failed to create some manifests:
      Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: "clusterbotpp_kubeletconfig.yaml": failed to update status for kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n : Operation cannot be fulfilled on kubeletconfigs.machineconfiguration.openshift.io "performance-clusterbotpp": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machineconfiguration.openshift.io/kubeletconfigs/performance-clusterbotpp, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 11f98d74-af1b-4a4c-9692-6dce56ee5cd9, UID in object meta:
      Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Created "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n
      Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Failed to update status for the "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n : Operation cannot be fulfilled on kubeletconfigs.machineconfiguration.openshift.io "performance-clusterbotpp": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machineconfiguration.openshift.io/kubeletconfigs/performance-clusterbotpp, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 597dfcf3-012d-4730-912a-78efabb920ba, UID in object meta:
      

      This leads to worker nodes not getting ready in time, which leads to installer marking the cluster installation failed. Ironically, even after the cluster installer returns with failure, if you wait long enough (sometimes) I have observed the cluster eventually reconciles and the worker nodes get provisioned. 

      I am attaching the installation logs from one such run with this issue. 

       

       

      Version-Release number of selected component (if applicable):

      4.14

      How reproducible:

      Often

      Steps to Reproduce:

      1. Try to install new cluster by placing PeformanceProfile in the manifests folder
      2.
      3.
      

      Actual results:

      Cluster installation failed. 

      Expected results:

      Cluster installation should succeed. 

      Additional info:

      Also, I didn't observe this occurring in 4.13.9. 

              fromani@redhat.com Francesco Romani
              harpatil@redhat.com Harshal Patil
              Shereen Haj Shereen Haj
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: