-
Task
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
False
-
None
-
False
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
-
WHY
As the user of 3scale operator I need confidence that the status in the APIManager reflects the true state of the deployments.
WHAT
When considering readiness of 3scale installation we need to take into account 2 scenarios: fresh install and upgrade.
During a fresh install, when a deployment fails, this can be for different reasons like: missing image, incompatible image that keeps erroring out, the operator should be able to identify it and report via APIM what's wrong. This should be an error status message.
During upgrade scenario, when a deployment fails, openshift reverts back to last know working configuration, this causes an operator to still show as "installed successfully" due to how OLM treats the operator installation (as soon as operator pod is "ready" the operator is installed). This means that we potentially can have a situation where an operator is updated, the images of components are not due to lets say, missing image, the deployments of components are then reverted and operator is now aware of mismatch between what it thinks is installed (running) and what actually is on cluster. The biggest issue related to this is a multi-minor version switch.
For example, if customer is on operator version 1.0.0, and all of the components are on version 1.0.0. 2.0.0 comes in, operator is updated to 2.0.0, it attempts to update the deployments but an image is missing, openshift reverts the version of the component to 1.0.0 and operator is marked as successful. Now, 3.0.0 comes in, the components are bumped successfully, and now we have multi-minor upgrade of for example, system app, going from 1.0.0 to 3.0.0. Which must be avoided.
HOW
There are a couple of things we can do here.
1. Operator should be checking if the images it has in it's envars, are the same images that are on the deployments that it is reconciling. If not, a warning message should be presented on APIM.
2. A warning message will not stop the 3scale instance from running so the upgrade multi-minor version is still a problem. To avoid / block further upgrades to the operator, we could use Operaotr Conditions. Operator conditions allow for the operator to be scheduled as "un-upgradable" therefore, preventing any further mess. However, the downside is, what if the fix to broken image is in next version? Is manual override to remove the Operator Condition manually? Are there better approaches?
DONE
3scale APIManager reflects the state of the installation on APIM status block.
3scale Operator doesn't upgrade if the components versions are mismatched with what operator expects.