Loading...

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 4.11.0
Affects Version/s: 4.11
Component/s: Networking / router
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
1
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:

4.11
Release Blocker:
None
Sprint:
Sprint 214, Sprint 215, Sprint 216, Sprint 217, Sprint 218, Sprint 219, Sprint 220
sprint_count:
7

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:

Hide
Cause: The logic in the ingress-operator didn't validate whether a kubernetes service object in the openshift-ingress namespace was actually created/owned by the ingress controller it was attempting to reconcile with.

Consequence: The ingress-operator would modify/remove kubernetes services with the same name and namespace regardless of ownership which could cause unexpected behavior. This quite rare because the service has to have a very specific name and it also has to be in the openshift-ingress namespace.

Fix: The ingress-operator now checks the ownership of existing kubernetes services it attempts to create/remove and if ownership doesn't match, the ingress-operator throws an error and does not take any action.

Result: The ingress-operator won't modify/delete custom kubernetes services with the same name as one that it wants to modify/remove in the openshift-ingress namespace.

Show
Cause: The logic in the ingress-operator didn't validate whether a kubernetes service object in the openshift-ingress namespace was actually created/owned by the ingress controller it was attempting to reconcile with. Consequence: The ingress-operator would modify/remove kubernetes services with the same name and namespace regardless of ownership which could cause unexpected behavior. This quite rare because the service has to have a very specific name and it also has to be in the openshift-ingress namespace. Fix: The ingress-operator now checks the ownership of existing kubernetes services it attempts to create/remove and if ownership doesn't match, the ingress-operator throws an error and does not take any action. Result: The ingress-operator won't modify/delete custom kubernetes services with the same name as one that it wants to modify/remove in the openshift-ingress namespace.

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Manual backport mirror of https://bugzilla.redhat.com/show_bug.cgi?id=2054200
Description of problem:

Having `IngressController` with `endpointPublishingStrategy` set to `Private` and a `kubernetes` service created with same naming convention `NodePort` in `openshift-ingress` namespace is being removed when the `ingress-operator` is restarted.

$ oc get ingresscontroller -n openshift-ingress-operator example-service-testing -o json
{
"apiVersion": "operator.openshift.io/v1",
"kind": "IngressController",
"metadata":

{ "creationTimestamp": "2022-02-14T10:34:35Z", "finalizers": [ "ingresscontroller.operator.openshift.io/finalizer-ingresscontroller" ], "generation": 2, "name": "example-service-testing", "namespace": "openshift-ingress-operator", "resourceVersion": "19329705", "uid": "ffc9f14d-63ad-43bb-8a56-5e590cda9b38" }

,
"spec": {
"clientTLS": {
"clientCA":

{ "name": "" }

,
"clientCertificatePolicy": ""
},
"domain": "apps.example.com",
"endpointPublishingStrategy": {
"type": "Private"
},
"httpEmptyRequestsPolicy": "Respond",
"httpErrorCodePages": {
"name": ""
},
"tuningOptions": {},
"unsupportedConfigOverrides": null
},
"status": {
"availableReplicas": 2,
"conditions": [

{ "lastTransitionTime": "2022-02-14T10:34:35Z", "reason": "Valid", "status": "True", "type": "Admitted" }

,
{
"lastTransitionTime": "2022-02-14T10:34:35Z",
"status": "True",
"type": "PodsScheduled"
},
{
"lastTransitionTime": "2022-02-14T10:35:10Z",
"message": "The deployment has Available status condition set to True",
"reason": "DeploymentAvailable",
"status": "True",
"type": "DeploymentAvailable"
},
{
"lastTransitionTime": "2022-02-14T10:35:10Z",
"message": "Minimum replicas requirement is met",
"reason": "DeploymentMinimumReplicasMet",
"status": "True",
"type": "DeploymentReplicasMinAvailable"
},
{
"lastTransitionTime": "2022-02-14T10:35:10Z",
"message": "All replicas are available",
"reason": "DeploymentReplicasAvailable",
"status": "True",
"type": "DeploymentReplicasAllAvailable"
},
{
"lastTransitionTime": "2022-02-14T10:34:35Z",
"message": "The configured endpoint publishing strategy does not include a managed load balancer",
"reason": "EndpointPublishingStrategyExcludesManagedLoadBalancer",
"status": "False",
"type": "LoadBalancerManaged"
},
{
"lastTransitionTime": "2022-02-14T10:34:35Z",
"message": "The endpoint publishing strategy doesn't support DNS management.",
"reason": "UnsupportedEndpointPublishingStrategy",
"status": "False",
"type": "DNSManaged"
},
{
"lastTransitionTime": "2022-02-14T10:35:10Z",
"status": "True",
"type": "Available"
},
{
"lastTransitionTime": "2022-02-14T10:35:10Z",
"status": "False",
"type": "Degraded"
}
],
"domain": "apps.example.com",
"endpointPublishingStrategy": {
"type": "Private"
},
"observedGeneration": 2,
"selector": "ingresscontroller.operator.openshift.io/deployment-ingresscontroller=example-service-testing",
"tlsProfile": {
"ciphers": [
"ECDHE-ECDSA-AES128-GCM-SHA256",
"ECDHE-RSA-AES128-GCM-SHA256",
"ECDHE-ECDSA-AES256-GCM-SHA384",
"ECDHE-RSA-AES256-GCM-SHA384",
"ECDHE-ECDSA-CHACHA20-POLY1305",
"ECDHE-RSA-CHACHA20-POLY1305",
"DHE-RSA-AES128-GCM-SHA256",
"DHE-RSA-AES256-GCM-SHA384",
"TLS_AES_128_GCM_SHA256",
"TLS_AES_256_GCM_SHA384",
"TLS_CHACHA20_POLY1305_SHA256"
],
"minTLSVersion": "VersionTLS12"
}
}
}

$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-default LoadBalancer 172.30.242.215 a777bc4ce4da740d99abdaa899bf8e88-1599963277.us-west-1.elb.amazonaws.com 80:30779/TCP,443:31713/TCP 13d
router-internal-default ClusterIP 172.30.233.135 <none> 80/TCP,443/TCP,1936/TCP 13d
router-internal-example-service-testing ClusterIP 172.30.86.100 <none> 80/TCP,443/TCP,1936/TCP 87m

After `IngressController` creation, it all looks as expected and for the Private `IngressController` we can see `router-internal-example-service-testing` Service.

$ oc create svc nodeport router-example-service-testing --tcp=80
service/router-example-service-testing created

$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-default LoadBalancer 172.30.242.215 a777bc4ce4da740d99abdaa899bf8e88-1599963277.us-west-1.elb.amazonaws.com 80:30779/TCP,443:31713/TCP 13d
router-internal-default ClusterIP 172.30.233.135 <none> 80/TCP,443/TCP,1936/TCP 13d
router-internal-example-service-testing ClusterIP 172.30.86.100 <none> 80/TCP,443/TCP,1936/TCP 88m
router-example-service-testing NodePort 172.30.2.39 <none> 80:31874/TCP 3s

Now we are creating a `kubernetes` service of type NodePort with the same naming scheme like the one created by the `IngressController`. So far so good and also no impact or similar with regards to functionality.

$ oc get pod -n openshift-ingress-operator
NAME READY STATUS RESTARTS AGE
ingress-operator-7d56fd784c-plwpj 2/2 Running 0 78m

$ oc delete pod ingress-operator-7d56fd784c-plwpj -n openshift-ingress-operator
pod "ingress-operator-7d56fd784c-plwpj" deleted

$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-default LoadBalancer 172.30.242.215 a777bc4ce4da740d99abdaa899bf8e88-1599963277.us-west-1.elb.amazonaws.com 80:30779/TCP,443:31713/TCP 13d
router-internal-default ClusterIP 172.30.233.135 <none> 80/TCP,443/TCP,1936/TCP 13d
router-internal-example-service-testing ClusterIP 172.30.86.100 <none> 80/TCP,443/TCP,1936/TCP 88m
router-example-service-testing NodePort 172.30.2.39 <none> 80:31874/TCP 53s

$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-default LoadBalancer 172.30.242.215 a777bc4ce4da740d99abdaa899bf8e88-1599963277.us-west-1.elb.amazonaws.com 80:30779/TCP,443:31713/TCP 13d
router-internal-default ClusterIP 172.30.233.135 <none> 80/TCP,443/TCP,1936/TCP 13d
router-internal-example-service-testing ClusterIP 172.30.86.100 <none> 80/TCP,443/TCP,1936/TCP 89m

$ oc logs ingress-operator-7d56fd784c-7g48r -n openshift-ingress-operator -c ingress-operator
2022-02-14T12:03:26.624Z INFO operator.main ingress-operator/start.go:63 using operator namespace {"namespace": "openshift-ingress-operator"}
I0214 12:03:27.675884 1 request.go:668] Waited for 1.02063447s due to client-side throttling, not priority and fairness, request: GET:
https://172.30.0.1:443/apis/apps.openshift.io/v1?timeout=32s
2022-02-14T12:03:29.284Z INFO operator.main ingress-operator/start.go:63 registering Prometheus metrics for canary_controller
2022-02-14T12:03:29.284Z INFO operator.main ingress-operator/start.go:63 registering Prometheus metrics for ingress_controller
[...]
2022-02-14T12:03:33.119Z INFO operator.dns dns/controller.go:535 using region from operator config {"region name": "us-west-1"}
2022-02-14T12:03:33.417Z INFO operator.ingress_controller controller/controller.go:298 reconciling {"request": "openshift-ingress-operator/example-service-testing"}
2022-02-14T12:03:33.509Z INFO operator.ingress_controller ingress/load_balancer_service.go:190 deleted load balancer service {"namespace": "openshift-ingress", "name": "router-example-service-testing"}
[...]

When restarting the `ingress-operator` pod we can see that shortly after, the manual created `kubernetes` service of type NodePort is being removed. Looking through the code it looks related to
https://bugzilla.redhat.com/show_bug.cgi?id=1914127
but that should only target/focus on `kubernetes` Service of type Loadbalancer. But we can clearly see that this is happening for all `kubernetes` Service type if they are matching the pre-defined `IngressController` naming scheme.

As this is not expected and also the `kubernetes` Services don't have any owner reference to the `IngressController` created services, it's unexpected that does are being removed and thus this should be fixed.

OpenShift release version:

OpenShift Container Platform 4.9.15

Cluster Platform:

AWS but likely on other platform as well

How reproducible:

Always

Steps to Reproduce (in detail):
1. See the steps in the problem description

Actual results:

`kubernetes` services of any type and without owner reference to the `IngressController` are being removed by the `IngressController` if they have a specific naming scheme.

Expected results:

`kubernetes` services without `IngressController` reference should never be touched/modified/removed by the same as they may be required for 3rd party integration or similar.

Impact of the problem:

3rd party implementation broken after updating to OpenShift Container Platform 4.8 as some helper services were removed unexpected.

Additional info:

Check
https://bugzilla.redhat.com/show_bug.cgi?id=1914127
as this seems the change that introduced that behavior. Although this seems specific for `kubernetes` type LoadBalancer and we are therefore wondering why other services are in scope as well.

blocks

OCPBUGS-1623 Bug 2054200 - Custom created services in openshift-ingress removed even though the services are not of type LoadBalancer

Closed

is cloned by

OCPBUGS-1623 Bug 2054200 - Custom created services in openshift-ingress removed even though the services are not of type LoadBalancer

Closed

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide