Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54439

Hypershift CSC KasGoMemLimit is not working, due to wrong unit suffix

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • In Progress
    • Bug Fix
    • Hide
      Cause – The wrong data type was implemented for the KasGoMemLimit that did not work with Golang's expected suffix values for this field.
      Consequence – Using the CSC CR per the current API would have caused the KAS pod to fail to start.
      Fix – The data type for KasGoMemLimit was updated and CEL was put in place to ensure the right suffixes would be used.
      Result – The CSC CR per the updated API standard should be able to be used without causing the KAS pod to fail to start.
      Show
      Cause – The wrong data type was implemented for the KasGoMemLimit that did not work with Golang's expected suffix values for this field. Consequence – Using the CSC CR per the current API would have caused the KAS pod to fail to start. Fix – The data type for KasGoMemLimit was updated and CEL was put in place to ensure the right suffixes would be used. Result – The CSC CR per the updated API standard should be able to be used without causing the KAS pod to fail to start.
    • None
    • None
    • None
    • None

      Description of problem:

          Hypershift CSC KasGoMemLimit validation is not accepting the right unit suffixes, leading to crashlooping KAS pods on using it.

      https://pkg.go.dev/runtime#hdr-Environment_Variables - this variable should be in bytes, with units KiB or MiB or GiB

      CSC is validating it for Ki or Mi or Gi 

      # clustersizingconfigurations.scheduling.hypershift.openshift.io "cluster" was not valid:
      # * spec.sizes[0].effects.kasGoMemLimit: Invalid value: "24576MiB": spec.sizes[0].effects.kasGoMemLimit in body should match '^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$'
      

      KAS crashes with wrong GOMEMLIMIT unit

      GOMEMLIMIT=24576Mi
      fatal error: malformed GOMEMLIMIT; see `go doc runtime/debug.SetMemoryLimit`
      
      runtime stack:
      runtime.throw({0x401b7aa?, 0xa?})
      	runtime/panic.go:1023 +0x5c fp=0x7fffdebe4378 sp=0x7fffdebe4348 pc=0x4400fc
      runtime.readGOMEMLIMIT()
      	runtime/mgcpacer.go:1331 +0xaf fp=0x7fffdebe43a8 sp=0x7fffdebe4378 pc=0x429bcf
      runtime.gcinit()
      	runtime/mgc.go:187 +0x2e fp=0x7fffdebe43d8 sp=0x7fffdebe43a8 pc=0x42110e
      runtime.schedinit()
      	runtime/proc.go:805 +0x1af fp=0x7fffdebe4450 sp=0x7fffdebe43d8 pc=0x44406f
      runtime.rt0_go()
      	runtime/asm_amd64.s:349 +0x15a fp=0x7fffdebe4458 sp=0x7fffdebe4450 pc=0x47919a
      

       
      It works if I set the environment variable to GOMEMLIMIT=24576MiB instead

       

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          Always

      Steps to Reproduce:

          1. Install HO with size tagging
          2. Set CSC to use KasGoMemLimit - https://github.com/openshift/hypershift/blob/main/docs/content/how-to/azure/scheduler.md#effects
          3. Create HC and watch KAS pod crashing
          

      Actual results:

          GOMEMLIMIT=24576Mi
      fatal error: malformed GOMEMLIMIT; see `go doc runtime/debug.SetMemoryLimit`

      Expected results:

          Should allow the right Unit suffix, GOMEMLIMIT=24576MiB

      Additional info:

          

              rh-ee-brcox Bryan Cox
              mukrishn@redhat.com Murali Krishnasamy
              None
              None
              He Liu He Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: