-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.13, 4.12, 4.14, 4.15
-
Important
-
No
-
Rejected
-
False
-
Description of problem:
Version-Release number of selected component (if applicable):
The openshift-router doesn't update status nor does it re-admit the route after a period of contention. The openshift-router has a default resync period of 30 minutes. In a situation in which routers disagree and post conflicting status on a route, the routers will enter a contention state. The routers will back off from writing updates to avoid an infinite loop of updating status. There will always be 1 "winner" router pod and N-number of "loser" router pod(s) where the "winner" successfully wrote the last update to the status before the routers backed off (i.e., it won). However, I noticed that if I resolve the contention by deleting the "winner" and leaving 1 loser, the loser never updates the route status, even after the router's resync is triggered. Debugging the code, there is a plugin chain with the router, and the plugin chain follows this order: 1. HostAdmitter 2. UniqueHost 3. ExtendedValidator ...etc It appears that during a resync, the plugin change gets stopped at UniqueHost. In UniqueHost, I found that we have custom Route Index code (hostindex.go). It appears if the route doesn't "activate", I think meaning, it "changed" defined by the logic here (existing.ResourceVersion == route.ResourceVersion), it won't be passed to the other plugins. The 30m resync doesn't trigger a route ResourceVersion change. This seems very odd that the UniqueHost plugin is preventing all of the other plugins (including the ones that update the status...) from doing their job because the route isn't "activated".
How reproducible:
100%
Steps to Reproduce:
I wrote a script to test and expose this bug. It reduces the resync to 1 minute, so a test might take +1 minute. You can change the image to which ever router version you want to test: 1. wget https://gist.githubusercontent.com/gcs278/949b1c5a5cabf7bb271c83f760ebf61a/raw/6d7516c6806b2961757d6ac3ea80204e9e8ceaca/router-contention-resync-test.sh
Actual results:
Routes fail to resync status
Expected results:
Routes should resync status
Additional info: