Description of problem:
Create Kataconfig just after operator been installed fails sporadically in CI:
07:49:48 TASK [install-kata-operator : Create operator subscription] ********************
07:49:48 task path: /var/lib/jenkins/workspace/ocp-olm-setup/ocp-edge-qe/linchpin-workspace/hooks/ansible/ocp-edge-setup/roles/install-kata-operator/tasks/install-kata.yml:198
07:49:48 changed: [provisionhost-0-0] =>
{"changed": true, "cmd": "set +e\ncat <<EOF | oc apply -f -\napiVersion: operators.coreos.com/v1alpha1\nkind: Subscription\nmetadata:\n name: kata-operator\n namespace: \"openshift-sandboxed-containers-operator\"\nspec:\n channel: \"preview-1.1\"\n installPlanApproval: Automatic\n name: sandboxed-containers-operator\n source: kata-qe-optional-operators\n sourceNamespace: openshift-marketplace\n startingCSV: \"sandboxed-containers-operator.v1.1.0\"\nEOF\n", "delta": "0:00:00.329896", "end": "2021-11-28 00:49:47.589984", "failed_when_result": false, "rc": 0, "start": "2021-11-28 00:49:47.260088", "stderr": "", "stderr_lines": [], "stdout": "subscription.operators.coreos.com/kata-operator created", "stdout_lines": ["subscription.operators.coreos.com/kata-operator created"]}
07:49:48
07:49:48 TASK [install-kata-operator : Make sure the sandboxed-containers is installed] ***
07:49:48 task path: /var/lib/jenkins/workspace/ocp-olm-setup/ocp-edge-qe/linchpin-workspace/hooks/ansible/ocp-edge-setup/roles/install-kata-operator/tasks/install-kata.yml:220
07:49:48 FAILED - RETRYING: Make sure the sandboxed-containers is installed (20 retries left).
07:50:05 FAILED - RETRYING: Make sure the sandboxed-containers is installed (19 retries left).
07:50:20 changed: [provisionhost-0-0] =>
{"attempts": 3, "changed": true, "cmd": "oc get pods -n \"openshift-sandboxed-containers-operator\" -o json | jq -r '.items[] | select(.metadata.name | test(\"sandboxed-containers-operator-controller-manager-*\")).status.phase'\n", "delta": "0:00:00.118035", "end": "2021-11-28 00:50:19.786984", "rc": 0, "start": "2021-11-28 00:50:19.668949", "stderr": "", "stderr_lines": [], "stdout": "Running", "stdout_lines": ["Running"]}
07:50:20
07:50:20 TASK [install-kata-operator : Create kataconfig] *******************************
07:50:20 task path: /var/lib/jenkins/workspace/ocp-olm-setup/ocp-edge-qe/linchpin-workspace/hooks/ansible/ocp-edge-setup/roles/install-kata-operator/tasks/install-kata.yml:230
07:50:21 fatal: [provisionhost-0-0]: FAILED! =>
{"changed": true, "cmd": "set +e\ncat <<EOF | oc apply -f -\napiVersion: kataconfiguration.openshift.io/v1\nkind: KataConfig\nmetadata:\n name: example-kataconfig\nEOF\n", "delta": "0:00:00.640764", "end": "2021-11-28 00:50:21.107591", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2021-11-28 00:50:20.466827", "stderr": "Error from server (InternalError): error when creating \"STDIN\": Internal error occurred: failed calling webhook \"vkataconfig.kb.io\": Post \"https://sandboxed-containers-operator-controller-manager-service.openshift-sandboxed-containers-operator.svc:443/validate-kataconfiguration-openshift-io-v1-kataconfig?timeout=10s\": dial tcp 10.130.0.55:9443: connect: connection refused", "stderr_lines": ["Error from server (InternalError): error when creating \"STDIN\": Internal error occurred: failed calling webhook \"vkataconfig.kb.io\": Post \"https://sandboxed-containers-operator-controller-manager-service.openshift-sandboxed-containers-operator.svc:443/validate-kataconfiguration-openshift-io-v1-kataconfig?timeout=10s\": dial tcp 10.130.0.55:9443: connect: connection refused"], "stdout": "", "stdout_lines": []}
Same issue was spotted both on 4.9 and 4.10 latest builds (with kata 1.1.0)
How reproducible:
Deploy kata operator in disconnected environment
Create kataconfig just after pods are running in openshift-sandboxed-containers-operator namespace
Actual results:
Post \"https://sandboxed-containers-operator-controller-manager-service.openshift-sandboxed-containers-operator.svc:443/validate-kataconfiguration-openshift-io-v1-kataconfig?timeout=10s\": dial tcp 10.130.0.55:9443: connect: connection refused"
Expected results:
Post should succeed and kataconfig applied
Additional info:
Looks like timing issue, fails sporadically in CI