Description of problem:
With https://github.com/openshift/origin/pull/28073 we have introduced the upstream feature of bumping snapshot revisions. This drastically improved our pass rates and our restore procedure: https://sippy.dptools.openshift.org/sippy-ng/jobs/4.14/analysis?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-cluster-etcd-operator-release-4.14-periodics-e2e-aws-etcd-recovery%22%7D%5D%7D Sometimes however, there are assertions failing like: > fail [github.com/openshift/origin/test/extended/dr/resource_assertions.go:98]: Expected an error to have occurred. Got: <nil>: nil Which indicates that a namespace that should not be included in the snapshot was indeed retrieved from the API after a restore: https://github.com/openshift/origin/blob/6ee9dc56a612a4c886d094571832ed47efa2e831/test/extended/dr/resource_assertions.go#L97-L99 This should obviously not happen, this namespace should not be found. run: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-cluster-etcd-operator-release-4.14-periodics-e2e-aws-etcd-recovery/1685868016341880832
Version-Release number of selected component (if applicable):
4.14
How reproducible:
initially rarely, now fairly often
Steps to Reproduce:
1. run the e2e recovery test multiple times
Actual results:
test finds resources that should've been not existing
Expected results:
the test does *not* find the resources that are not included in the snapshot
Additional info:
We have two possible explanations: * etcd does indeed contain that namespace, which should be easily tested with etcdctl (meaning that our snapshot must be wrong or the restore procedure is picking up a WAL that's left over with those changes) * api server still serves the stuff from its cache Or, of course, the assertion is wrong :) Low priority because that's not "officially" shipped in 4.14.