[OCPBUGS-8691] Operands running management side missing affinity, tolerations, node selector and priority rules than the operator

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: 4.14.0
Affects Version/s: 4.13, 4.12, 4.14
Component/s: Storage / Operators
Labels:

Severity:
Important
Regression:
No
Sprint:
Storage Sprint 233
sprint_count:
1
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
N/A
Release Note Type:
Release Note Not Required
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

In hypershift context:
Operands managed by Operators running in the hosted control plane namespace in the management cluster do not honour affinity opinions https://hypershift-docs.netlify.app/how-to/distribute-hosted-cluster-workloads/
https://github.com/openshift/hypershift/blob/main/support/config/deployment.go#L263-L265

These operands running management side should honour the same affinity, tolerations, node selector and priority rules than the operator.
This could be done by looking at the operator deployment itself or at the HCP resource.

aws-ebs-csi-driver-controller
aws-ebs-csi-driver-operator
csi-snapshot-controller
csi-snapshot-webhook

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create a hypershift cluster.
2. Check affinity rules and node selector of the operands above.
3.

Actual results:

Operands missing affinity rules and node selecto

Expected results:

Operands have same affinity rules and node selector than the operator

Additional info:

blocks

HOSTEDCP-546 Provide a mechanism for operands to get control plane affinity and priority rules

Closed

OCPBUGS-10645 4.13: Operands running management side missing affinity, tolerations, node selector and priority rules than the operator

Closed

is cloned by

OCPBUGS-8692 Operands running management side missing affinity, tolerations, node selector and priority rules than the operator

Closed

OCPBUGS-10645 4.13: Operands running management side missing affinity, tolerations, node selector and priority rules than the operator

Closed

OCPBUGS-10646 4.12: Operands running management side missing affinity, tolerations, node selector and priority rules than the operator

Closed

links to

openshift/aws-ebs-csi-driver-operator#205: OCPBUGS-8691: Hypershift: set control plane operand properties

openshift/cluster-csi-snapshot-controller-operator#146: OCPBUGS-8691: WIP: Hypershift: set Deployment properties

openshift/cluster-storage-operator#352: OCPBUGS-8691: Hypershift: set control plane operand properties

openshift/hypershift#2301: OCPBUGS-8691: WIP: Add cluster-storage-operator perms. to watch HostedControlPlane

openshift/library-go#1489: OCPBUGS-8691: Allow CSI driver operators to override nodeSelector on HyperShift

RHSA-2023:5006 OpenShift Container Platform 4.14.z security update

mentioned on

Merge request - Bump IBM integration to our latest prod image.

(6 links to, 1 mentioned on)

GitLab CEE Bot added a comment - 2024/01/23 9:09 PM

Ian Main mentioned this issue in a merge request of Service Delivery / app-interface on branch ibm_integration_bump:

Bump IBM integration to our latest prod image.

GitLab CEE Bot added a comment - 2024/01/23 9:09 PM Ian Main mentioned this issue in a merge request of Service Delivery / app-interface on branch ibm_integration_bump : Bump IBM integration to our latest prod image.

Errata Tool added a comment - 2023/10/31 1:41 PM

Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

For information on the advisory (Important: OpenShift Container Platform 4.14.0 bug fix and security update), and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2023:5006

Errata Tool added a comment - 2023/10/31 1:41 PM Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.14.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:5006

Rohit Patil added a comment - 2023/04/11 10:12 AM

Marking as Verified based on result of normal OCP cluster.

Rohit Patil added a comment - 2023/04/11 10:12 AM Marking as Verified based on result of normal OCP cluster.

Antoni Segura Puimedon added a comment - 2023/03/28 10:33 AM

increasing the priority to blocker (not for OCP, but for ROSA)

Antoni Segura Puimedon added a comment - 2023/03/28 10:33 AM increasing the priority to blocker (not for OCP, but for ROSA)

Jan Safranek added a comment - 2023/03/22 1:59 PM

Some notes from testing:

Get an OCP cluster with all worker nodes in the same availability zone, e.g. get 3 replicas in us-eas1-1a and 0 in the others:

$ oc -n openshift-machine-api scale machineset/jsafrane-1-vnqrz-worker-us-east-1a --replicas=3
$ oc -n openshift-machine-api scale machineset/jsafrane-1-vnqrz-worker-us-east-1b --replicas=0
$ oc -n openshift-machine-api scale machineset/jsafrane-1-vnqrz-worker-us-east-1c --replicas=0

Install HyperShift into it as usual, no special config needed. No special version needed either.
Install a guest cluster with the PR(s), i.e. the bug must be fixed there.
Edit the HostedCluster + add e.g. nodeSelector:

$ oc -n clusters edit hostedcluster <your hosted cluster>

...
spec:
  nodeSelector:
    kubernetes.io/hostname: ip-10-0-153-163.ec2.internal

See all hosted control plane pods getting re-created on the given node. AWS EBS CSI driver operator + driver + snapshot controller Pods should get re-created there too.

$ oc -n clusters-jsafrane get pod -o wide
NAME                                                  READY   STATUS    RESTARTS   AGE     IP             NODE                           NOMINATED NODE   READINESS GATES
aws-ebs-csi-driver-controller-bfbdb85bc-g9z6s         7/7     Running   0          10m     10.129.2.63    ip-10-0-153-163.ec2.internal   <none>           <none>
aws-ebs-csi-driver-operator-679cb46978-6vvfc          1/1     Running   0          10m     10.129.2.64    ip-10-0-153-163.ec2.internal   <none>           <none>
cluster-storage-operator-9f5849847-cxwfv              1/1     Running   0          9m6s    10.129.2.95    ip-10-0-153-163.ec2.internal   <none>           <none>
csi-snapshot-controller-857c664f5-zc9pz               1/1     Running   0          10m     10.129.2.65    ip-10-0-153-163.ec2.internal   <none>           <none>
csi-snapshot-controller-operator-88b54f859-g4jt5      1/1     Running   0          9m6s    10.129.2.93    ip-10-0-153-163.ec2.internal   <none>           <none>
csi-snapshot-webhook-6dbd87bbb4-ph6sj                 1/1     Running   0          10m     10.129.2.66    ip-10-0-153-163.ec2.internal   <none>           <none>

Similarly, label a random node with hypershift.openshift.io/cluster: <hosted control plane namespace> and clear the HostedCluster nodeSelector. All newly created pods should be scheduled on the labelled node. By removing nodeSelector from HostedCluster, all Pods in the hosted control plane will be re-created with empty nodeSelector and nodeAffinity should schedule them on the labelled node (if there is space for them there).

$ oc label nodes <node name> hypershift.openshift.io/cluster=clusters-jsafrane
$ oc -n clusters edit hostedcluster jsafrane
# delete nodeSelector

Jan Safranek added a comment - 2023/03/22 1:59 PM Some notes from testing: Get an OCP cluster with all worker nodes in the same availability zone, e.g. get 3 replicas in us-eas1-1a and 0 in the others: $ oc -n openshift-machine-api scale machineset/jsafrane-1-vnqrz-worker-us-east-1a --replicas=3 $ oc -n openshift-machine-api scale machineset/jsafrane-1-vnqrz-worker-us-east-1b --replicas=0 $ oc -n openshift-machine-api scale machineset/jsafrane-1-vnqrz-worker-us-east-1c --replicas=0 Install HyperShift into it as usual, no special config needed. No special version needed either. Install a guest cluster with the PR(s), i.e. the bug must be fixed there. Edit the HostedCluster + add e.g. nodeSelector: $ oc -n clusters edit hostedcluster <your hosted cluster> ... spec: nodeSelector: kubernetes.io/hostname: ip-10-0-153-163.ec2.internal See all hosted control plane pods getting re-created on the given node. AWS EBS CSI driver operator + driver + snapshot controller Pods should get re-created there too. $ oc -n clusters-jsafrane get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES aws-ebs-csi-driver-controller-bfbdb85bc-g9z6s 7/7 Running 0 10m 10.129.2.63 ip-10-0-153-163.ec2.internal <none> <none> aws-ebs-csi-driver-operator-679cb46978-6vvfc 1/1 Running 0 10m 10.129.2.64 ip-10-0-153-163.ec2.internal <none> <none> cluster-storage-operator-9f5849847-cxwfv 1/1 Running 0 9m6s 10.129.2.95 ip-10-0-153-163.ec2.internal <none> <none> csi-snapshot-controller-857c664f5-zc9pz 1/1 Running 0 10m 10.129.2.65 ip-10-0-153-163.ec2.internal <none> <none> csi-snapshot-controller-operator-88b54f859-g4jt5 1/1 Running 0 9m6s 10.129.2.93 ip-10-0-153-163.ec2.internal <none> <none> csi-snapshot-webhook-6dbd87bbb4-ph6sj 1/1 Running 0 10m 10.129.2.66 ip-10-0-153-163.ec2.internal <none> <none> Similarly, label a random node with hypershift.openshift.io/cluster: <hosted control plane namespace> and clear the HostedCluster nodeSelector. All newly created pods should be scheduled on the labelled node. By removing nodeSelector from HostedCluster, all Pods in the hosted control plane will be re-created with empty nodeSelector and nodeAffinity should schedule them on the labelled node (if there is space for them there). $ oc label nodes <node name> hypershift.openshift.io/cluster=clusters-jsafrane $ oc -n clusters edit hostedcluster jsafrane # delete nodeSelector

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: GitLab CEE Bot added a comment - 2024/01/23 9:09 PM

Expand comment: GitLab CEE Bot added a comment - 2024/01/23 9:09 PM

Collapse comment: Errata Tool added a comment - 2023/10/31 1:41 PM

Expand comment: Errata Tool added a comment - 2023/10/31 1:41 PM

Collapse comment: Rohit Patil added a comment - 2023/04/11 10:12 AM

Expand comment: Rohit Patil added a comment - 2023/04/11 10:12 AM

Collapse comment: Antoni Segura Puimedon added a comment - 2023/03/28 10:33 AM

Expand comment: Antoni Segura Puimedon added a comment - 2023/03/28 10:33 AM

Collapse comment: Jan Safranek added a comment - 2023/03/22 1:59 PM

Expand comment: Jan Safranek added a comment - 2023/03/22 1:59 PM

People

Dates