Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-26062

Day 0 PerformanceProfile is failing for SNO and Compact clusters

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • 4.15.0
    • Node Tuning Operator
    • None
    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Since https://github.com/openshift/cluster-node-tuning-operator/pull/854, the preferred way to create a PerformanceProfile is to do it at Day 0.
      
      However it seems not working for SNO and compact clusters when the PerformanceProfile is referencing the master MCP.
      

      Version-Release number of selected component (if applicable):

      OpenShift v4.15.0-rc.0
      

      How reproducible:

      Tested on BM IPI and SNO BM deployments.
      

      Steps to Reproduce:

      1. * Create an install-config.yaml file to deploy a BareMetal IPI OpenShift 4.15.0-rc.0 cluster with compute.workers.replicas set to 0.
         * or create an install-config.yaml file to deploy a BareMetal SNO cluster using the the manual method described in OpenShift documentation (https://docs.openshift.com/container-platform/latest/installing/installing_sno/install-sno-installing-sno.html#install-sno-installing-sno-manually).
      
      2. After running the command {{openshift-install create manifests}}, create the following manifests at Day 0 (they are similar to the ones referrenced in https://issues.redhat.com/browse/OCPBUGS-18640):
      
      ---
      kind: MachineConfigPool
      apiVersion: machineconfiguration.openshift.io/v1
      metadata:
        name: master
      spec:
        machineConfigSelector:
          matchLabels:
            machineconfiguration.openshift.io/role: master
        nodeSelector:
          matchLabels:
            node-role.kubernetes.io/master: ""
      
      ---
      kind: PerformanceProfile
      apiVersion: performance.openshift.io/v2
      metadata:
        name: dpdk
      spec:
        cpu:
          isolated: "1-3"
          reserved: "0"
        hugepages:
          defaultHugepagesSize: 2M
          pages:
            - size: 2M
              count: 32
        net:
          userLevelNetworking: true
        numa:
          topologyPolicy: single-numa-node
        realTimeKernel:
          enabled: false
        machineConfigPoolSelector:
          pools.operator.machineconfiguration.openshift.io/master: ""
        nodeSelector:
          node-role.kubernetes.io/master: ""
       
      3. Deploy the cluster
      

      Actual results:

      Cluster deployment fails at bootstrapping stage
      
      For SNO clusters, most of the time logs are spamming the following error
      
      > journalctl -b -u bootkube.service
      
      bootkube.sh[7451]: [#4] failed to create some manifests:
      bootkube.sh[7451]: "performance_profile_dpdk.yaml": failed to create performanceprofiles.v2.performance.openshift.io/dpdk -n : Internal error occurred: failed calling webhook "vwb.performance.openshift.io": failed to call webhook: Post "https://performance-addon-operator-service.openshift-cluster-node-tuning-operator.svc:443/validate-performance-openshift-io-v2-performanceprofile?timeout=10s": no endpoints available for service "performance-addon-operator-service"
      
      For compact clusters (and SNO when it doesn't fail previously) logs are spamming the following error
      
      > oc  -n openshift-machine-config-operator logs deployment/machine-config-controller -c machine-config-controller
      
      I1223 14:10:24.299182       1 kubelet_config_controller.go:491] KubeletConfig performance-dpdk has been deleted
      W1223 14:10:25.095025       1 kubelet_config_controller.go:462] error updating the kubelet config with annotation key "machineconfiguration.openshift.io/mc-name-suffix" and value "": kubeletconfig.machineconfiguration.openshift.io "performance-dpdk" not found
      W1223 14:10:25.095050       1 kubelet_config_controller.go:429] error updating kubeletconfig status: kubeletconfig.machineconfiguration.openshift.io "performance-dpdk" not found
      I1223 14:10:25.095060       1 kubelet_config_controller.go:332] Error syncing kubeletconfig performance-dpdk: kubeletconfig.machineconfiguration.openshift.io "performance-dpdk" not found
      I1223 14:10:25.133332       1 node_controller.go:1035] No nodes available for updates
      I1223 14:10:25.133603       1 status.go:224] Degraded Machine: cnvqe-08.lab.eng.tlv2.redhat.com and Degraded Reason: machineconfig.machineconfiguration.openshift.io "rendered-master-82d8570749169c031983cc3e9151d030" not found
      

      Additional info:

      It seems simply creating a Tuned resource at Day 0 is also failing for SNO and compact clusters
      
      ---
      kind: Tuned
      apiVersion: tuned.openshift.io/v1
      metadata:
        name: hugepages
        namespace: openshift-cluster-node-tuning-operator
      spec:
        profile:
          - name: openshift-node-hugepages
            data: |
              [main]
              summary=Boot time configuration for hugepages
              include=openshift-node
              [bootloader]
              cmdline_openshift_node_hugepages=default_hugepagesz=2M hugepages=32
        recommend:
          - machineConfigLabels:
              machineconfiguration.openshift.io/role: "master"
            priority: 25
            profile: openshift-node-hugepages
      
      > oc  -n openshift-machine-config-operator logs deployment/machine-config-controller -c machine-config-controller
      
      I1222 21:35:08.908410       1 status.go:224] Degraded Machine: cnvqe-03.lab.eng.tlv2.redhat.com and Degraded Reason: machineconfig.machineconfiguration.openshift.io "rendered-master-f3b3143b5d67b2efcb405cb1051662a4" not found
      
      > oc  -n openshift-machine-config-operator logs daemonset/machine-config-daemon -c machine-config-daemon
      
      I1222 21:26:28.144081   15114 node.go:52] Setting initial node config: rendered-master-f3b3143b5d67b2efcb405cb1051662a4
      I1222 21:26:28.152814   15114 daemon.go:1495] In bootstrap mode
      E1222 21:26:28.152954   15114 writer.go:226] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-f3b3143b5d67b2efcb405cb1051662a4" not found
      

            msivak@redhat.com Martin Sivak
            dollierp@redhat.com Denis Ollier Pinas
            Mallapadi Niranjan Mallapadi Niranjan
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: