-
Bug
-
Resolution: Done
-
Major
-
None
-
4.9
-
None
-
Important
-
None
-
2
-
Sprint 225, Sprint 226
-
2
-
Rejected
-
False
-
-
Manual backport for 4.9.z
Description of problem:
Having `IngressController` with `endpointPublishingStrategy` set to `Private` and a `kubernetes` service created with same naming convention `NodePort` in `openshift-ingress` namespace is being removed when the `ingress-operator` is restarted.
$ oc get ingresscontroller -n openshift-ingress-operator example-service-testing -o json
{
"apiVersion": "operator.openshift.io/v1",
"kind": "IngressController",
"metadata":
,
"spec": {
"clientTLS": {
"clientCA":
,
"clientCertificatePolicy": ""
},
"domain": "apps.example.com",
"endpointPublishingStrategy":
,
"httpEmptyRequestsPolicy": "Respond",
"httpErrorCodePages":
,
"tuningOptions": {},
"unsupportedConfigOverrides": null
},
"status": {
"availableReplicas": 2,
"conditions": [
,
{ "lastTransitionTime": "2022-02-14T10:34:35Z", "status": "True", "type": "PodsScheduled" },
{ "lastTransitionTime": "2022-02-14T10:35:10Z", "message": "The deployment has Available status condition set to True", "reason": "DeploymentAvailable", "status": "True", "type": "DeploymentAvailable" },
{ "lastTransitionTime": "2022-02-14T10:35:10Z", "message": "Minimum replicas requirement is met", "reason": "DeploymentMinimumReplicasMet", "status": "True", "type": "DeploymentReplicasMinAvailable" },
{ "lastTransitionTime": "2022-02-14T10:35:10Z", "message": "All replicas are available", "reason": "DeploymentReplicasAvailable", "status": "True", "type": "DeploymentReplicasAllAvailable" },
{ "lastTransitionTime": "2022-02-14T10:34:35Z", "message": "The configured endpoint publishing strategy does not include a managed load balancer", "reason": "EndpointPublishingStrategyExcludesManagedLoadBalancer", "status": "False", "type": "LoadBalancerManaged" },
{ "lastTransitionTime": "2022-02-14T10:34:35Z", "message": "The endpoint publishing strategy doesn't support DNS management.", "reason": "UnsupportedEndpointPublishingStrategy", "status": "False", "type": "DNSManaged" },
{ "lastTransitionTime": "2022-02-14T10:35:10Z", "status": "True", "type": "Available" },
{ "lastTransitionTime": "2022-02-14T10:35:10Z", "status": "False", "type": "Degraded" }],
"domain": "apps.example.com",
"endpointPublishingStrategy":
,
"observedGeneration": 2,
"selector": "ingresscontroller.operator.openshift.io/deployment-ingresscontroller=example-service-testing",
"tlsProfile":
}
}
$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-default LoadBalancer 172.30.242.215 a777bc4ce4da740d99abdaa899bf8e88-1599963277.us-west-1.elb.amazonaws.com 80:30779/TCP,443:31713/TCP 13d
router-internal-default ClusterIP 172.30.233.135 <none> 80/TCP,443/TCP,1936/TCP 13d
router-internal-example-service-testing ClusterIP 172.30.86.100 <none> 80/TCP,443/TCP,1936/TCP 87m
After `IngressController` creation, it all looks as expected and for the Private `IngressController` we can see `router-internal-example-service-testing` Service.
$ oc create svc nodeport router-example-service-testing --tcp=80
service/router-example-service-testing created
$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-default LoadBalancer 172.30.242.215 a777bc4ce4da740d99abdaa899bf8e88-1599963277.us-west-1.elb.amazonaws.com 80:30779/TCP,443:31713/TCP 13d
router-internal-default ClusterIP 172.30.233.135 <none> 80/TCP,443/TCP,1936/TCP 13d
router-internal-example-service-testing ClusterIP 172.30.86.100 <none> 80/TCP,443/TCP,1936/TCP 88m
router-example-service-testing NodePort 172.30.2.39 <none> 80:31874/TCP 3s
Now we are creating a `kubernetes` service of type NodePort with the same naming scheme like the one created by the `IngressController`. So far so good and also no impact or similar with regards to functionality.
$ oc get pod -n openshift-ingress-operator
NAME READY STATUS RESTARTS AGE
ingress-operator-7d56fd784c-plwpj 2/2 Running 0 78m
$ oc delete pod ingress-operator-7d56fd784c-plwpj -n openshift-ingress-operator
pod "ingress-operator-7d56fd784c-plwpj" deleted
$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-default LoadBalancer 172.30.242.215 a777bc4ce4da740d99abdaa899bf8e88-1599963277.us-west-1.elb.amazonaws.com 80:30779/TCP,443:31713/TCP 13d
router-internal-default ClusterIP 172.30.233.135 <none> 80/TCP,443/TCP,1936/TCP 13d
router-internal-example-service-testing ClusterIP 172.30.86.100 <none> 80/TCP,443/TCP,1936/TCP 88m
router-example-service-testing NodePort 172.30.2.39 <none> 80:31874/TCP 53s
$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-default LoadBalancer 172.30.242.215 a777bc4ce4da740d99abdaa899bf8e88-1599963277.us-west-1.elb.amazonaws.com 80:30779/TCP,443:31713/TCP 13d
router-internal-default ClusterIP 172.30.233.135 <none> 80/TCP,443/TCP,1936/TCP 13d
router-internal-example-service-testing ClusterIP 172.30.86.100 <none> 80/TCP,443/TCP,1936/TCP 89m
$ oc logs ingress-operator-7d56fd784c-7g48r -n openshift-ingress-operator -c ingress-operator
2022-02-14T12:03:26.624Z INFO operator.main ingress-operator/start.go:63 using operator namespace {"namespace": "openshift-ingress-operator"}
I0214 12:03:27.675884 1 request.go:668] Waited for 1.02063447s due to client-side throttling, not priority and fairness, request: GET:
https://172.30.0.1:443/apis/apps.openshift.io/v1?timeout=32s
2022-02-14T12:03:29.284Z INFO operator.main ingress-operator/start.go:63 registering Prometheus metrics for canary_controller
2022-02-14T12:03:29.284Z INFO operator.main ingress-operator/start.go:63 registering Prometheus metrics for ingress_controller
[...]
2022-02-14T12:03:33.119Z INFO operator.dns dns/controller.go:535 using region from operator config {"region name": "us-west-1"}
2022-02-14T12:03:33.417Z INFO operator.ingress_controller controller/controller.go:298 reconciling {"request": "openshift-ingress-operator/example-service-testing"}
2022-02-14T12:03:33.509Z INFO operator.ingress_controller ingress/load_balancer_service.go:190 deleted load balancer service {"namespace": "openshift-ingress", "name": "router-example-service-testing"}
[...]
When restarting the `ingress-operator` pod we can see that shortly after, the manual created `kubernetes` service of type NodePort is being removed. Looking through the code it looks related to
https://bugzilla.redhat.com/show_bug.cgi?id=1914127
but that should only target/focus on `kubernetes` Service of type Loadbalancer. But we can clearly see that this is happening for all `kubernetes` Service type if they are matching the pre-defined `IngressController` naming scheme.
As this is not expected and also the `kubernetes` Services don't have any owner reference to the `IngressController` created services, it's unexpected that does are being removed and thus this should be fixed.
OpenShift release version:
- OpenShift Container Platform 4.9.15
Cluster Platform:
- AWS but likely on other platform as well
How reproducible:
- Always
Steps to Reproduce (in detail):
1. See the steps in the problem description
Actual results:
`kubernetes` services of any type and without owner reference to the `IngressController` are being removed by the `IngressController` if they have a specific naming scheme.
Expected results:
`kubernetes` services without `IngressController` reference should never be touched/modified/removed by the same as they may be required for 3rd party integration or similar.
Impact of the problem:
3rd party implementation broken after updating to OpenShift Container Platform 4.8 as some helper services were removed unexpected.
Additional info:
Check
https://bugzilla.redhat.com/show_bug.cgi?id=1914127
as this seems the change that introduced that behavior. Although this seems specific for `kubernetes` type LoadBalancer and we are therefore wondering why other services are in scope as well.
- clones
-
OCPBUGS-1623 Bug 2054200 - Custom created services in openshift-ingress removed even though the services are not of type LoadBalancer
- Closed
- is blocked by
-
OCPBUGS-1623 Bug 2054200 - Custom created services in openshift-ingress removed even though the services are not of type LoadBalancer
- Closed
- links to