Resolution: Done
Strategic Product Work
OCPSTRAT-949 - [Tech Preview] OpenShift on Oracle Cloud Infrastructure (OCI) with VMs
As an OCP Engineer, I would like to understand the best volume to be used on control plane nodes to prevent latency issues on etc, so we can estimate a safe values before going live with Oracle clusters.
The Block Volumes Performance on OCI is determined by VPS/GB. The VPS used on UPI and AI deployment is 20 (Higher Performance). The OCP deployment and the cluster are operational, but when running OPCT (e2e tests/workloads) the etcd logs are reporting very low latency (1s+) in more than 50% of requests.
This repeats for UPI, AI and OCVS deployments. Threads related reporting etcd low latency:
- OCP OCI UPI and AI: https://redhat-internal.slack.com/archives/G012C6LKVM2/p1684504503222929
- OCP OCI Baremetal [workers] : https://redhat-internal.slack.com/archives/C03UZKBRHT5/p1683631642811849
- OCP OCI OCVS: https://redhat-internal.slack.com/archives/C04T569EL1Z/p1684746169245709
The goal of this card is to create control plane with high volume performance, generate workloads and measure if it could be an ideal value. We can use know platform (AWS) as baseline, which reports less than 10% of requests above 500ms.
- https://docs.oracle.com/en-us/iaas/Content/Block/Concepts/blockvolumeperformance.htm
- https://redhat-openshift-ecosystem.github.io/provider-certification-tool/user-installation-review/#review-etcd-logs-etcd-slow-requests
- UPI boot volume config: https://github.com/mtulio/ansible-collection-okd-installer/pull/26/files#diff-a61c30110b03d8dc62b00c012dfb2d1ba79a1e5a25fe303cb34cd207957dc46eR46-R49
- is related to
OPCT-210 [plugins][tool-etcd] Add stats for etcd log parser - requests apply took too long
- Closed
- relates to
SPLAT-1003 Validate OpenShift on Oracle Cloud VMware Solution (OCVS)
- Closed
- links to