Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-53048

[4.18] the image of OpenShift Cluster Capacity Tool can't work well

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.18.z
    • kube-scheduler
    • None
    • Important
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

       The image of OpenShift Cluster Capacity Tool for 4.18 can't work well, it return results like: 
      W0312 09:40:26.309738       2 registry.go:345] setting componentGlobalsRegistry in SetFallback. We recommend calling componentGlobalsRegistry.Set() right after parsing flags to avoid using feature gates before their final values are set by the flags.
      Failed to parse pod spec file: Invalid pod: "Required value: spec.terminationGracePeriodSeconds" 
      
      The image can be found at: https://catalog.redhat.com/software/containers/openshift4/ose-cluster-capacity-rhel9/652809707e435c863687fb8a?image=67bd6586d8d3414b8a4e5b72 
      Tag:v4.18.0-202502250302.p0.gbe5401d.assembly.stream.el9
      
      image digest: registry.redhat.io/openshift4/ose-cluster-capacity-rhel9@sha256:6ae2e1e45b8cac6247d03d5d6dcd8f2829c5762df210f5e380e93f3c2c858234 

      Version-Release number of selected component (if applicable):

      4.18.0-0.nightly-2025-03-12-205221

      How reproducible:

          Always 

      Steps to Reproduce:

      Run steps as https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/nodes/working-with-clusters#nodes-cluster-resource-levels-job_nodes-cluster-resource-levels 
      
      1 oc new-project testaa
      
      2 oc create sa cluster-capacity-sa
      
      3 oc create -f cluster-capacity-cluster-role.yaml
      kind: ClusterRole
      apiVersion: authorization.openshift.io/v1
      metadata:
        name: cluster-capacity-role
      rules:
      - apiGroups: ["*"]
        resources: ["*"]
        verbs: ["get", "watch", "list"]
      
      4 oc adm policy add-cluster-role-to-user cluster-capacity-role system:serviceaccount:testaa:cluster-capacity-sa
      
      5 oc create -f cluster-capacity-configmap.yaml
      apiVersion: v1
      data:
        pod.yaml: |
          apiVersion: v1
          kind: Pod
          metadata:
            name: small-pod
            namespace: cluster-capacity
            labels:
              app: guestbook
              tier: frontend
          spec:
            containers:
            - name: php-redis
              image: quay.io/openshifttest/gb-frontend:v4
              imagePullPolicy: Always
              resources:
                limits:
                  cpu: 300m
                  memory: 200Mi
                requests:
                  cpu: 150m
                  memory: 100Mi
        pod_with_taint.yaml: |
          apiVersion: v1
          kind: Pod
          metadata:
            name: small-pod
            namespace: cluster-capacity
            labels:
              app: guestbook
              tier: frontend
          spec:
            containers:
            - name: php-redis
              image: quay.io/openshifttest/gb-frontend:v4
              imagePullPolicy: Always
              resources:
                limits:
                  cpu: 300m
                  memory: 200Mi
                requests:
                  cpu: 150m
                  memory: 100Mi
            tolerations:
            - key: cc
              value: cc
              operator: Equal
              effect: NoSchedule
        pod_with_nodeSelector.yaml: |
          apiVersion: v1
          kind: Pod
          metadata:
            name: small-pod
            namespace: cluster-capacity
            labels:
              app: guestbook
              tier: frontend
          spec:
            containers:
            - name: php-redis
              image: quay.io/openshifttest/gb-frontend:v4
              imagePullPolicy: Always
              resources:
                limits:
                  cpu: 300m
                  memory: 200Mi
                requests:
                  cpu: 150m
                  memory: 100Mi
            nodeSelector:
              cc: "true"
      kind: ConfigMap
      metadata:
        name: cluster-capacity-configmap
      
      6 oc create -f cluster-capacity-rc.yaml
      apiVersion: v1
      kind: ReplicationController
      metadata:
        labels:
          run: cluster-capacity
        name: cluster-capacity-2
      spec:
        replicas: 1
        selector:
          run: cluster-capacity
        template:
          metadata:
            labels:
              run: cluster-capacity
          spec:
              containers:
              - name: cluster-capacity
                image: registry.redhat.io/openshift4/ose-cluster-capacity-rhel9@sha256:6ae2e1e45b8cac6247d03d5d6dcd8f2829c5762df210f5e380e93f3c2c858234
                volumeMounts:
                - mountPath: /test-pod
                  name: test-volume
                env:
                - name: CC_INCLUSTER
                  value: "true"
                command:
                - "/bin/sh"
                - "-ec"
                - |
                  /bin/cluster-capacity --podspec=/test-pod/pod.yaml --verbose;while true;do sleep 10;done
              serviceAccountName: cluster-capacity-sa
              volumes:
              - name: test-volume
                configMap:
                  name: cluster-capacity-configmap
      
      7 % oc get rc
      NAME                 DESIRED   CURRENT   READY   AGE
      cluster-capacity-2   1         1         1       8s
      
      % oc get pod 
      NAME                       READY   STATUS    RESTARTS   AGE 
      cluster-capacity-2-s4zdf   1/1     Running       0          8s
      
      8 % oc logs cluster-capacity-2-s4zdf
        

      Actual results:

       8 the image can't work as expected: 
       %oc logs cluster-capacity-2-s4zdf 
      W0312 09:40:26.309738       2 registry.go:345] setting componentGlobalsRegistry in SetFallback. We recommend calling componentGlobalsRegistry.Set() right after parsing flags to avoid using feature gates before their final values are set by the flags. Failed to parse pod spec file: Invalid pod: "Required value: spec.terminationGracePeriodSeconds" 

      Expected results:

       8 the image should work well like:
      
      %oc logs cluster-capacity-2-s4zdf 
      small-pod pod requirements:
          - CPU: 150m
          - Memory: 100MiThe cluster can schedule 57 instance(s) of the pod small-pod.Termination reason: Unschedulable: 0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 3 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling.Pod distribution among nodes:
      small-pod
          - ip-10-0-33-207.us-east-2.compute.internal: 21 instance(s)
          - ip-10-0-6-24.us-east-2.compute.internal: 18 instance(s)
          - ip-10-0-77-110.us-east-2.compute.internal: 18 instance(s)
      
      
      

      Additional info:

       The image "brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-capacity-rhel9:v4.18" hit the same issue, but v4.17 is normal

              aos-workloads-staff Workloads Team Bot Account
              rhn-support-minmli Min Li
              Min Li Min Li
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: