Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: Documentation
Labels:
None

Story Points:
2
Epic Link:
Review RHODS resource requirements
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Acceptance Criteria:
None
Affects Testing:

Testable
Automated:
No
Regression:
No
Target Release:

RHODS_1.26.0_GA
Test Blocker:
No
Test Coverage:

Pending
Watchlist Impact:
None
Intelligence Requested:
Market:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

In RHODS service definition and also in the user docs explaining the requirements for installation the current recommendation is to have 2 worker nodes with at least 8 vCPUs and 32 GB of memory per node (for example, AWS instance type m5.2xlarge or larger)

With RHODS 1.24 this configuration allows you to start two Small notebooks and deploy one model using a small model server. Attempting to start additional Small notebooks will not be possible due to Insufficient Cluster Resources

Pod unschedulable
0/7 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 2 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/7 nodes are available: 2 No preemption victims found for incoming pod, 5 Preemption is not helpful for scheduling.

With RHODS 1.25, the inclusion of the data-science-pipelines-operator there are even less resources available, so only two notebooks can be started simultaneously (but not the model)

I think the service definition and the user docs should be updated to explain that 2 worker nodes with at least 8 vCPUs and 32 GB are the bare minimum requirements for installation, but for actual usage of RHODS extra cluster resources will be required.

Reported by: jorge-rhods

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

service-definition.png
50 kB
2023/05/08 11:14 AM
rhods-1.25-describe-node-worker2-01-after-install.txt
16 kB
2023/04/06 4:47 PM
rhods-1.25-describe-node-worker1-01-after-install.txt
11 kB
2023/04/06 4:47 PM
rhods-1.24-describe-node-worker2-01-after-install.txt
12 kB
2023/04/06 4:47 PM
rhods-1.24-describe-node-worker1-01-after-install.txt
17 kB
2023/04/06 4:47 PM

is related to

RHODS-7944 Remove kfdef for DSPO from odh-deployer

Closed

relates to

RHODS-7943 Reduce DSPO resource requests

Closed

RHODS-12317 Add ability to modify replica counts for individual components

Backlog

mentioned on

Merge request - Merge branch 'RHODS-7899-update-recommended-cluster-resources-for-installing-rhods' into 'stage-1.26'

Merge request - RHODS-7899-update-recommended-cluster-resources-for-installing-rhods into master

Assignee:: Chris Tyler

Reporter:: DDF Bot

QA Contact:: Jorge Garcia Oncins

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Created:: 2023/04/06 4:15 PM

Updated:: 2023/10/03 6:57 AM

Resolved:: 2023/05/09 10:22 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates