Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: None
Affects Version/s: 4.18.z, 4.19.z
Component/s: kube-scheduler
Labels:
- TestBlocker

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

 The image of OpenShift Cluster Capacity Tool for 4.18 can't work well, it return results like: 
W0312 09:40:26.309738       2 registry.go:345] setting componentGlobalsRegistry in SetFallback. We recommend calling componentGlobalsRegistry.Set() right after parsing flags to avoid using feature gates before their final values are set by the flags.
Failed to parse pod spec file: Invalid pod: "Required value: spec.terminationGracePeriodSeconds" 

The image can be found at: https://catalog.redhat.com/software/containers/openshift4/ose-cluster-capacity-rhel9/652809707e435c863687fb8a?image=67bd6586d8d3414b8a4e5b72 
Tag:v4.18.0-202502250302.p0.gbe5401d.assembly.stream.el9

image digest: registry.redhat.io/openshift4/ose-cluster-capacity-rhel9@sha256:6ae2e1e45b8cac6247d03d5d6dcd8f2829c5762df210f5e380e93f3c2c858234

Version-Release number of selected component (if applicable):

4.18.0-0.nightly-2025-03-12-205221

How reproducible:

    Always

Steps to Reproduce:

Run steps as https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/nodes/working-with-clusters#nodes-cluster-resource-levels-job_nodes-cluster-resource-levels 

1 oc new-project testaa

2 oc create sa cluster-capacity-sa

3 oc create -f cluster-capacity-cluster-role.yaml
kind: ClusterRole
apiVersion: authorization.openshift.io/v1
metadata:
  name: cluster-capacity-role
rules:
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["get", "watch", "list"]

4 oc adm policy add-cluster-role-to-user cluster-capacity-role system:serviceaccount:testaa:cluster-capacity-sa

5 oc create -f cluster-capacity-configmap.yaml
apiVersion: v1
data:
  pod.yaml: |
    apiVersion: v1
    kind: Pod
    metadata:
      name: small-pod
      namespace: cluster-capacity
      labels:
        app: guestbook
        tier: frontend
    spec:
      containers:
      - name: php-redis
        image: quay.io/openshifttest/gb-frontend:v4
        imagePullPolicy: Always
        resources:
          limits:
            cpu: 300m
            memory: 200Mi
          requests:
            cpu: 150m
            memory: 100Mi
  pod_with_taint.yaml: |
    apiVersion: v1
    kind: Pod
    metadata:
      name: small-pod
      namespace: cluster-capacity
      labels:
        app: guestbook
        tier: frontend
    spec:
      containers:
      - name: php-redis
        image: quay.io/openshifttest/gb-frontend:v4
        imagePullPolicy: Always
        resources:
          limits:
            cpu: 300m
            memory: 200Mi
          requests:
            cpu: 150m
            memory: 100Mi
      tolerations:
      - key: cc
        value: cc
        operator: Equal
        effect: NoSchedule
  pod_with_nodeSelector.yaml: |
    apiVersion: v1
    kind: Pod
    metadata:
      name: small-pod
      namespace: cluster-capacity
      labels:
        app: guestbook
        tier: frontend
    spec:
      containers:
      - name: php-redis
        image: quay.io/openshifttest/gb-frontend:v4
        imagePullPolicy: Always
        resources:
          limits:
            cpu: 300m
            memory: 200Mi
          requests:
            cpu: 150m
            memory: 100Mi
      nodeSelector:
        cc: "true"
kind: ConfigMap
metadata:
  name: cluster-capacity-configmap

6 oc create -f cluster-capacity-rc.yaml
apiVersion: v1
kind: ReplicationController
metadata:
  labels:
    run: cluster-capacity
  name: cluster-capacity-2
spec:
  replicas: 1
  selector:
    run: cluster-capacity
  template:
    metadata:
      labels:
        run: cluster-capacity
    spec:
        containers:
        - name: cluster-capacity
          image: registry.redhat.io/openshift4/ose-cluster-capacity-rhel9@sha256:6ae2e1e45b8cac6247d03d5d6dcd8f2829c5762df210f5e380e93f3c2c858234
          volumeMounts:
          - mountPath: /test-pod
            name: test-volume
          env:
          - name: CC_INCLUSTER
            value: "true"
          command:
          - "/bin/sh"
          - "-ec"
          - |
            /bin/cluster-capacity --podspec=/test-pod/pod.yaml --verbose;while true;do sleep 10;done
        serviceAccountName: cluster-capacity-sa
        volumes:
        - name: test-volume
          configMap:
            name: cluster-capacity-configmap

7 % oc get rc
NAME                 DESIRED   CURRENT   READY   AGE
cluster-capacity-2   1         1         1       8s

% oc get pod 
NAME                       READY   STATUS    RESTARTS   AGE 
cluster-capacity-2-s4zdf   1/1     Running       0          8s

8 % oc logs cluster-capacity-2-s4zdf

Actual results:

 8 the image can't work as expected: 
 %oc logs cluster-capacity-2-s4zdf 
W0312 09:40:26.309738       2 registry.go:345] setting componentGlobalsRegistry in SetFallback. We recommend calling componentGlobalsRegistry.Set() right after parsing flags to avoid using feature gates before their final values are set by the flags. Failed to parse pod spec file: Invalid pod: "Required value: spec.terminationGracePeriodSeconds"

Expected results:

 8 the image should work well like:

%oc logs cluster-capacity-2-s4zdf 
small-pod pod requirements:
    - CPU: 150m
    - Memory: 100MiThe cluster can schedule 57 instance(s) of the pod small-pod.Termination reason: Unschedulable: 0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 3 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling.Pod distribution among nodes:
small-pod
    - ip-10-0-33-207.us-east-2.compute.internal: 21 instance(s)
    - ip-10-0-6-24.us-east-2.compute.internal: 18 instance(s)
    - ip-10-0-77-110.us-east-2.compute.internal: 18 instance(s)

Additional info:

 The image "brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-capacity-rhel9:v4.18" hit the same issue, but v4.17 is normal

links to

openshift/cucushift#9920: OCPBUGS-53048: add terminationGracePeriodSeconds setting for small-pod

openshift/verification-tests#3857: OCPBUGS-53048: add spec.terminationGracePeriodSeconds with default value 30s

Assignee:: Workloads Team Bot Account

Reporter:: Min Li

Need Info From:: None

Contributors:: None

QA Contact:: Min Li

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2025/03/13 5:59 AM

Updated:: 2025/07/15 1:19 PM

Resolved:: 2025/07/02 6:15 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates