Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17557

[4.13] Improve error handling when provided with a faulty PerformanceProfile at install time

XMLWordPrintable

    • No
    • CNF Compute Sprint 239, CNF Compute Sprint 240, CNF Compute Sprint 241
    • 3
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When creating a SNO with provided performanceProfile at install time, if the performanceProfile has a typo, at least when the  CPU definition is faulty, such as:`isolated: "0,9-80-89"` notice the `-` instead of a `,` the code fails with 
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: panic: runtime error: invalid memory address or nil pointer dereference
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: [signal SIGSEGV: segmentation violation code=0x1 addr=0xb0 pc=0x1697ed8]
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: goroutine 1 [running]:
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: github.com/openshift/cluster-node-tuning-operator/pkg/performanceprofile/cmd/render.getContainerRuntimeName(0x40001dca80, 0x0?, {0x4000689430, 0x1, 0x40004c9a88?})
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/performanceprofile/cmd/render/render.go:276 +0x28

      Version-Release number of selected component (if applicable):

      4.13.2

      How reproducible:

      100%

      Steps to Reproduce:

      1.Create a SNO with agent install, and have a performanceProfile in the openshift folder with the typo as follow (in the isolated section)
      
      apiVersion: performance.openshift.io/v2
      kind: PerformanceProfile
      metadata:
        name: openshift-node-workload-partitioning-master
      spec:
        cpu:
          isolated: "0,9-80-89"
          reserved: "10-79,90-159" 
        machineConfigPoolSelector:
          pools.operator.machineconfiguration.openshift.io/master: ""
        nodeSelector:
          node-role.kubernetes.io/master: ""
      
      2. The cluster will fail to create.
      
      

      Actual results:

      Fail to install the cluster

      Expected results:

      The cluster should be installed

      Additional info:

      Logs
      
      Jun 07 13:31:25 aarch64-node systemd[1]: Started Bootstrap a Kubernetes cluster.
      Jun 07 13:31:34 aarch64-node bootkube.sh[34037]: Rendering Node Tuning core manifests...
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.373583       1 render.go:71] Rendering files into: /assets/node-tuning-bootstrap
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.493746       1 render.go:128] skipping "/assets/manifests/50-masters-chrony-configuration.yaml" [1] manifest because of unhandled *v1.MachineConfig
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.493972       1 render.go:128] skipping "/assets/manifests/50-workers-chrony-configuration.yaml" [1] manifest because of unhandled *v1.MachineConfig
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.494591       1 render.go:128] skipping "/assets/manifests/99_openshift-machineconfig_99-master-ssh.yaml" [1] manifest because of unhandled *v1.MachineConfig
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.494785       1 render.go:128] skipping "/assets/manifests/99_openshift-machineconfig_99-worker-ssh.yaml" [1] manifest because of unhandled *v1.MachineConfig
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.499133       1 render.go:128] skipping "/assets/manifests/cluster-dns-02-config.yml" [1] manifest because of unhandled *v1.DNS
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.514992       1 render.go:128] skipping "/assets/manifests/cluster-ingress-02-config.yml" [1] manifest because of unhandled *v1.Ingress
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.516510       1 render.go:128] skipping "/assets/manifests/cluster-network-02-config.yml" [1] manifest because of unhandled *v1.Network
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.516696       1 render.go:128] skipping "/assets/manifests/cluster-proxy-01-config.yaml" [1] manifest because of unhandled *v1.Proxy
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.517094       1 render.go:128] skipping "/assets/manifests/cluster-scheduler-02-config.yml" [1] manifest because of unhandled *v1.Scheduler
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.518772       1 render.go:128] skipping "/assets/manifests/cvo-overrides.yaml" [1] manifest because of unhandled *v1.ClusterVersion
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.519091       1 render.go:128] skipping "/assets/manifests/dnsmasq-bootstrap-in-place.yaml" [1] manifest because of unhandled *v1.MachineConfig
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.521554       1 render.go:128] skipping "/assets/manifests/mc-acelerated-container-startup-0.yaml" [1] manifest because of unhandled *v1.MachineConfig
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.521995       1 render.go:128] skipping "/assets/manifests/mc-reduced-platform-overhead-0.yaml" [1] manifest because of unhandled *v1.MachineConfig
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.522162       1 render.go:128] skipping "/assets/manifests/mc-sctp-0.yaml" [1] manifest because of unhandled *v1.MachineConfig
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.522356       1 render.go:128] skipping "/assets/manifests/mc-workload-partitioning-0.yaml" [1] manifest because of unhandled *v1.MachineConfig
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.522503       1 render.go:128] skipping "/assets/manifests/node-ip-hint-master.yaml" [1] manifest because of unhandled *v1.MachineConfig
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: I0607 13:31:34.522667       1 render.go:128] skipping "/assets/manifests/node-ip-hint-worker.yaml" [1] manifest because of unhandled *v1.MachineConfig
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: panic: runtime error: invalid memory address or nil pointer dereference
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: [signal SIGSEGV: segmentation violation code=0x1 addr=0xb0 pc=0x1697ed8]
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: goroutine 1 [running]:
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: github.com/openshift/cluster-node-tuning-operator/pkg/performanceprofile/cmd/render.getContainerRuntimeName(0x40001dca80, 0x0?, {0x4000689430, 0x1, 0x40004c9a88?})
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]:         /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/performanceprofile/cmd/render/render.go:276 +0x28
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: github.com/openshift/cluster-node-tuning-operator/pkg/performanceprofile/cmd/render.render({0xffffe2003ce5, 0x11}, {0xffffe2003d0a, 0x1d})
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]:         /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/performanceprofile/cmd/render/render.go:159 +0x39c
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: github.com/openshift/cluster-node-tuning-operator/pkg/performanceprofile/cmd/render.(*renderOpts).Run(...)
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]:         /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/performanceprofile/cmd/render/cmd.go:89
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: github.com/openshift/cluster-node-tuning-operator/pkg/performanceprofile/cmd/render.NewRenderCommand.func1(0x4000780300?, {0x1b5c852?, 0x2?, 0x2?})
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]:         /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/performanceprofile/cmd/render/cmd.go:52 +0xa8
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: github.com/spf13/cobra.(*Command).execute(0x4000780300, {0x4000b01420, 0x2, 0x2})
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]:         /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/github.com/spf13/cobra/command.go:920 +0x5c8
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: github.com/spf13/cobra.(*Command).ExecuteC(0x2c22cc0)
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]:         /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/github.com/spf13/cobra/command.go:1040 +0x360
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: github.com/spf13/cobra.(*Command).Execute(...)
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]:         /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/github.com/spf13/cobra/command.go:968
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]: main.main()
      Jun 07 13:31:34 aarch64-node bootkube.sh[36304]:         /go/src/github.com/openshift/cluster-node-tuning-operator/cmd/cluster-node-tuning-operator/main.go:332 +0x158
      Jun 07 13:31:34 aarch64-node systemd[1]: bootkube.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
      Jun 07 13:31:34 aarch64-node systemd[1]: bootkube.service: Failed with result 'exit-code'.
      Jun 07 13:31:34 aarch64-node systemd[1]: bootkube.service: Consumed 10.614s CPU time.
      Jun 07 13:31:39 aarch64-node systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 9.
      Jun 07 13:31:39 aarch64-node systemd[1]: Stopped Bootstrap a Kubernetes cluster.

            jojosneg@redhat.com Jose Luis Ojosnegros
            adetalho@redhat.com Alexis de Talhouƫt
            Mallapadi Niranjan Mallapadi Niranjan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: