-
Task
-
Resolution: Won't Do
-
Undefined
-
None
-
False
-
-
False
-
Unset
-
No
-
-
As a tenant using the stage environment for testing services before deploying them to production, there has been multiple occasions where stage not being as close to production as possible has led to finding out about unexpected issues in production when our services where tested in stage and expected to work properly.
An example of unexpected differences impacting us can be when a cluster update that went well in stage creates issues in production as the amount of resources required by the services in stage and production are not the same, so we could not detect the issue earlier.
The pattern we are seeing is that these issues are "unpredictable" as everything seems to be working when tested. We can mention here the migration attempt to RHOSAK as another example. It went well in stage, but not in production.
We think that:
- the existing limitations/differences should be documented
- there should be a way to track them
- providing solutions to this topic should be a priority.