-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.12.z
-
Moderate
-
No
-
SDN Sprint 234, SDN Sprint 235, SDN Sprint 236, SDN Sprint 237, SDN Sprint 238, SDN Sprint 239, SDN Sprint 240, SDN Sprint 241
-
8
-
Approved
-
False
-
Description of problem:
OpenShift Container Platform 4.12.5 installation with IPI installation method on Microsoft Azure is showing undesired behavior when trying to curl "https://api.<clustername>.<domain>:6443/readyz". When using `HostNetwork` it all works without any issues. But when doing the same request from a pod that does not have `HostNetwork` capabilties and therefore has an IP from the SDN range, a big portion of the requests is failing. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.5 True False 29m Cluster version is 4.12.5 $ oc get network cluster -o yaml apiVersion: config.openshift.io/v1 kind: Network metadata: creationTimestamp: "2023-03-10T13:12:06Z" generation: 2 name: cluster resourceVersion: "2975" uid: e1e9c464-526c-4ebf-ab84-0deedf092cac spec: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 externalIP: policy: {} networkType: OVNKubernetes serviceNetwork: - 172.30.0.0/16 status: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 clusterNetworkMTU: 1400 networkType: OVNKubernetes serviceNetwork: - 172.30.0.0/16 $ oc get infrastructure cluster -o yaml apiVersion: config.openshift.io/v1 kind: Infrastructure metadata: creationTimestamp: "2023-03-10T13:12:04Z" generation: 1 name: cluster resourceVersion: "430" uid: 5c260276-d901-40f7-a28c-172c492e81e6 spec: cloudConfig: key: config name: cloud-provider-config platformSpec: type: Azure status: apiServerInternalURI: https://api-int.clustername.domain.lab:6443 apiServerURL: https://api.clustername.domain.lab:6443 controlPlaneTopology: HighlyAvailable etcdDiscoveryDomain: "" infrastructureName: sreberazure-njj24 infrastructureTopology: HighlyAvailable platform: Azure platformStatus: azure: cloudName: AzurePublicCloud networkResourceGroupName: sreberazure-njj24-rg resourceGroupName: sreberazure-njj24-rg type: Azure $ oc project openshift-apiserver Already on project "openshift-apiserver" on server "https://api.clustername.domain.lab:6443". $ oc get pod NAME READY STATUS RESTARTS AGE apiserver-6f58784797-kq4kr 2/2 Running 0 41m apiserver-6f58784797-l69jr 2/2 Running 0 38m apiserver-6f58784797-nn6tn 2/2 Running 0 45m $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES apiserver-6f58784797-kq4kr 2/2 Running 0 42m 10.130.0.21 sreberazure-njj24-master-0 <none> <none> apiserver-6f58784797-l69jr 2/2 Running 0 38m 10.129.0.29 sreberazure-njj24-master-2 <none> <none> apiserver-6f58784797-nn6tn 2/2 Running 0 45m 10.128.0.36 sreberazure-njj24-master-1 <none> <none> $ oc rsh apiserver-6f58784797-l69jr Defaulted container "openshift-apiserver" out of: openshift-apiserver, openshift-apiserver-check-endpoints, fix-audit-permissions (init) sh-4.4# while true; do curl -k --connect-timeout 1 https://api.clustername.domain.lab:6443/readyz; sleep 1; done curl: (28) Connection timed out after 1000 milliseconds okokokcurl: (28) Connection timed out after 1001 milliseconds okokcurl: (28) Connection timed out after 1003 milliseconds curl: (28) Connection timed out after 1001 milliseconds curl: (28) Connection timed out after 1001 milliseconds okokokokokokokokokcurl: (28) Connection timed out after 1001 milliseconds okokcurl: (28) Connection timed out after 1001 milliseconds curl: (28) Connection timed out after 1001 milliseconds ^C sh-4.4# exit exit command terminated with exit code 130 $ oc project openshift-kube-apiserver Now using project "openshift-kube-apiserver" on server "https://api.clustername.domain.lab:6443". $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES apiserver-watcher-sreberazure-njj24-master-0 1/1 Running 0 55m 10.0.0.6 sreberazure-njj24-master-0 <none> <none> apiserver-watcher-sreberazure-njj24-master-1 1/1 Running 0 57m 10.0.0.8 sreberazure-njj24-master-1 <none> <none> apiserver-watcher-sreberazure-njj24-master-2 1/1 Running 0 57m 10.0.0.7 sreberazure-njj24-master-2 <none> <none> installer-2-sreberazure-njj24-master-2 0/1 Completed 0 51m 10.129.0.27 sreberazure-njj24-master-2 <none> <none> installer-3-sreberazure-njj24-master-2 0/1 Completed 0 50m 10.129.0.32 sreberazure-njj24-master-2 <none> <none> installer-4-sreberazure-njj24-master-2 0/1 Completed 0 49m 10.129.0.36 sreberazure-njj24-master-2 <none> <none> installer-5-sreberazure-njj24-master-2 0/1 Completed 0 46m 10.129.0.15 sreberazure-njj24-master-2 <none> <none> installer-6-sreberazure-njj24-master-0 0/1 Completed 0 37m 10.130.0.27 sreberazure-njj24-master-0 <none> <none> installer-6-sreberazure-njj24-master-1 0/1 Completed 0 39m 10.128.0.45 sreberazure-njj24-master-1 <none> <none> installer-6-sreberazure-njj24-master-2 0/1 Completed 0 36m 10.129.0.37 sreberazure-njj24-master-2 <none> <none> kube-apiserver-guard-sreberazure-njj24-master-0 1/1 Running 0 37m 10.130.0.29 sreberazure-njj24-master-0 <none> <none> kube-apiserver-guard-sreberazure-njj24-master-1 1/1 Running 0 38m 10.128.0.47 sreberazure-njj24-master-1 <none> <none> kube-apiserver-guard-sreberazure-njj24-master-2 1/1 Running 0 50m 10.129.0.31 sreberazure-njj24-master-2 <none> <none> kube-apiserver-sreberazure-njj24-master-0 5/5 Running 0 37m 10.0.0.6 sreberazure-njj24-master-0 <none> <none> kube-apiserver-sreberazure-njj24-master-1 5/5 Running 0 38m 10.0.0.8 sreberazure-njj24-master-1 <none> <none> kube-apiserver-sreberazure-njj24-master-2 5/5 Running 0 34m 10.0.0.7 sreberazure-njj24-master-2 <none> <none> revision-pruner-6-sreberazure-njj24-master-0 0/1 Completed 0 33m 10.130.0.35 sreberazure-njj24-master-0 <none> <none> revision-pruner-6-sreberazure-njj24-master-1 0/1 Completed 0 33m 10.128.0.56 sreberazure-njj24-master-1 <none> <none> revision-pruner-6-sreberazure-njj24-master-2 0/1 Completed 0 33m 10.129.0.39 sreberazure-njj24-master-2 <none> <none> $ oc rsh kube-apiserver-sreberazure-njj24-master-1 sh-4.4# while true; do curl -k --connect-timeout 1 https://api.clustername.domain.lab:6443/readyz; sleep 1; done okokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokok Also changing `--connect-timeout 1` from curl to `--connect-timeout 10` for example does not have any impact. It simply takes longer until the timeout is reached.
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.12 (also previous version were not tested)
How reproducible:
Always
Steps to Reproduce:
1. Install OpenShift Container Platform 4.12 on Azure using IPI install method and set the SDN to OVN-Kubernetes 2. Once successfully installed run `oc project openshift-apiserver` 3. rsh apiserver-<podID> 4. while true; do curl -k --connect-timeout 1 https://api.clustername.domain.lab:6443/readyz; sleep 1; done
Actual results:
sh-4.4# while true; do curl -k --connect-timeout 1 https://api.clustername.domain.lab:6443/readyz; sleep 1; done curl: (28) Connection timed out after 1000 milliseconds okokokcurl: (28) Connection timed out after 1001 milliseconds okokcurl: (28) Connection timed out after 1003 milliseconds curl: (28) Connection timed out after 1001 milliseconds curl: (28) Connection timed out after 1001 milliseconds okokokokokokokokokcurl: (28) Connection timed out after 1001 milliseconds okokcurl: (28) Connection timed out after 1001 milliseconds curl: (28) Connection timed out after 1001 milliseconds
Expected results:
sh-4.4# while true; do curl -k --connect-timeout 1 https://api.clustername.domain.lab:6443/readyz; sleep 1; done okokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokok
Additional info:
- blocks
-
OCPBUGS-18159 Azure; NLB; OVN-K: Requests from CNI pods to internalAPI server domain fails always
- Closed
- links to
-
RHSA-2023:5006 OpenShift Container Platform 4.14.z security update