The combineStages method in ClusterPublisherImpl currently uses flatMap without an explicit concurrency level. That means it will subscribe to up to 128 futures at the same time. This isn't a problem when only in memory is used as the stages are completed as they are received. However when a segmented store is in play, this means it will process 128 segments at the same time, even when doing sequential publisher.
We should instead limit the concurrency to the cpuCount when the publisher is parallel and just 1 when it is sequential. The latter matches how many threads the upstream publisher will use as well.