Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: maistra-2.0.5.1
Affects Version/s: maistra-2.0.3, maistra-2.0.4, maistra-2.0.5
Component/s: operator
Labels:
None

Blocked:
False
Ready:
False
QE Test Coverage:
+
Release Note Text:
Undefined
Git Pull Request:
https://github.com/maistra/istio-operator/pull/718
Market:

Sprint:
Sprint 4

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Clusters configured with ovs-multitenant are experiencing ServiceMeshMemberRoll reconciliation issues when the number of members is above a certain threshold.

This issue appears only with the combination of ovs-multitenant and the new concurrent reconciliation of member namespaces introduced in 2.0.3.

When using ovs-multitenant, the istio operator joins a member namespace to the mesh by adding the `pod.network.openshift.io/multitenant.change-network` annotation to the `netnamespace` object for that member namespace (this is exactly what the `oc adm pod-network join-projects` command does). This annotation is then picked up by OpenShift, which joins the namespace to the correct network, and removes the annotation. Istio operator waits up to 16s for this to happen. Previously, as the namespaces were reconciled sequentially, the 16s timeout was adequate. In 2.0.3+ (with a high number of namespaces) that's no longer the case. OpenShift now has to process all those namespaces (i.e. remove the annotation) in 16s. If it fails to do so in just one of the namespaces, istio-operator considers the reconcile of that member to have failed and removes the member from SMMR.status.configuredNamespaces.

The reconciler then runs again (with backoff, but still almost immediately). Instead of adding the annotation to just the namespaces that are not yet joined to the mesh, the ovs-multitenant implementation in istio-operator adds it to each and every member specified in the SMMR (even those that are already joined to the mesh). This typically causes new failures in other namespaces as opposed to the previous attempt. This time it's these namespaces that are removed from the configuredMembers list. The entire process then repeats and this is why we're seeing the configuredMembers list change randomly.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

failure
27 kB
2021/05/21 12:52 AM
reproducer.tar
9 kB
2021/05/21 12:14 AM
success
26 kB
2021/05/21 12:52 AM

is cloned by

MAISTRA-2385 Problem reconciling SMMR with large number of members when ovs-multitenant is used

Closed

Assignee:: Marko Luksa

Reporter:: Marko Luksa

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2021/05/20 7:29 AM

Updated:: 2024/10/01 8:45 PM

Resolved:: 2021/05/24 12:35 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates