Currently we're seeing indexer pods being restarted by OOMKilled, the limit is 3Gb and the request us 2Gb, the issue is that the workload is pretty spikey dependent on the manifest that is being indexed and hence we should probably increase the limit further.
This spike is to loop-in appSRE to decide what the best course forward is together.
Initial:
- Stand up cluster
- Start loadtest
- Check pprof for mem usage
Options:
- If optimization can be made, make it
- Increase limit to 4Gb and risk the potential of evictions.
- Increase the limit and request to 4Gb (dependent on cluster resources).
- Add more indexer pods.
- Another option I haven't considered.
This is something important but as Quay will retry any failures, I wouldn't consider it critical