Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: 4.9
Component/s: Networking / router
Labels:
None

Severity:
Important
Regression:
None
Story Points:
2
Sprint:
Sprint 225, Sprint 226
sprint_count:
2
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
Cause: The logic in the ingress-operator didn't validate whether a kubernetes service object in the openshift-ingress namespace was actually created/owned by the ingress controller it was attempting to reconcile with.

Consequence: The ingress-operator would modify/remove kubernetes services with the same name and namespace regardless of ownership which could cause unexpected behavior. This quite rare because the service has to have a very specific name and it also has to be in the openshift-ingress namespace.

Fix: The ingress-operator now checks the ownership of existing kubernetes services it attempts to create/remove and if ownership doesn't match, the ingress-operator throws an error and does not take any action.

Result: The ingress-operator won't modify/delete custom kubernetes services with the same name as one that it wants to modify/remove in the openshift-ingress namespace.

Show
Cause: The logic in the ingress-operator didn't validate whether a kubernetes service object in the openshift-ingress namespace was actually created/owned by the ingress controller it was attempting to reconcile with. Consequence: The ingress-operator would modify/remove kubernetes services with the same name and namespace regardless of ownership which could cause unexpected behavior. This quite rare because the service has to have a very specific name and it also has to be in the openshift-ingress namespace. Fix: The ingress-operator now checks the ownership of existing kubernetes services it attempts to create/remove and if ownership doesn't match, the ingress-operator throws an error and does not take any action. Result: The ingress-operator won't modify/delete custom kubernetes services with the same name as one that it wants to modify/remove in the openshift-ingress namespace.
Target Version:

4.9.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Manual backport for 4.9.z

Description of problem:

Having `IngressController` with `endpointPublishingStrategy` set to `Private` and a `kubernetes` service created with same naming convention `NodePort` in `openshift-ingress` namespace is being removed when the `ingress-operator` is restarted.

$ oc get ingresscontroller -n openshift-ingress-operator example-service-testing -o json
{
"apiVersion": "operator.openshift.io/v1",
"kind": "IngressController",
"metadata":

{ "creationTimestamp": "2022-02-14T10:34:35Z", "finalizers": [ "ingresscontroller.operator.openshift.io/finalizer-ingresscontroller" ], "generation": 2, "name": "example-service-testing", "namespace": "openshift-ingress-operator", "resourceVersion": "19329705", "uid": "ffc9f14d-63ad-43bb-8a56-5e590cda9b38" }

,
"spec": {
"clientTLS": {
"clientCA":

{ "name": "" }

,
"clientCertificatePolicy": ""
},
"domain": "apps.example.com",
"endpointPublishingStrategy":

{ "type": "Private" }

,
"httpEmptyRequestsPolicy": "Respond",
"httpErrorCodePages":

{ "name": "" }

,
"tuningOptions": {},
"unsupportedConfigOverrides": null
},
"status": {
"availableReplicas": 2,
"conditions": [

{ "lastTransitionTime": "2022-02-14T10:34:35Z", "reason": "Valid", "status": "True", "type": "Admitted" }

{ "lastTransitionTime": "2022-02-14T10:34:35Z", "status": "True", "type": "PodsScheduled" }

{ "lastTransitionTime": "2022-02-14T10:35:10Z", "message": "The deployment has Available status condition set to True", "reason": "DeploymentAvailable", "status": "True", "type": "DeploymentAvailable" }

{ "lastTransitionTime": "2022-02-14T10:35:10Z", "message": "Minimum replicas requirement is met", "reason": "DeploymentMinimumReplicasMet", "status": "True", "type": "DeploymentReplicasMinAvailable" }

{ "lastTransitionTime": "2022-02-14T10:35:10Z", "message": "All replicas are available", "reason": "DeploymentReplicasAvailable", "status": "True", "type": "DeploymentReplicasAllAvailable" }

{ "lastTransitionTime": "2022-02-14T10:34:35Z", "message": "The configured endpoint publishing strategy does not include a managed load balancer", "reason": "EndpointPublishingStrategyExcludesManagedLoadBalancer", "status": "False", "type": "LoadBalancerManaged" }

{ "lastTransitionTime": "2022-02-14T10:34:35Z", "message": "The endpoint publishing strategy doesn't support DNS management.", "reason": "UnsupportedEndpointPublishingStrategy", "status": "False", "type": "DNSManaged" }

{ "lastTransitionTime": "2022-02-14T10:35:10Z", "status": "True", "type": "Available" }

{ "lastTransitionTime": "2022-02-14T10:35:10Z", "status": "False", "type": "Degraded" }

],
"domain": "apps.example.com",
"endpointPublishingStrategy":

{ "type": "Private" }

,
"observedGeneration": 2,
"selector": "ingresscontroller.operator.openshift.io/deployment-ingresscontroller=example-service-testing",
"tlsProfile":

{ "ciphers": [ "ECDHE-ECDSA-AES128-GCM-SHA256", "ECDHE-RSA-AES128-GCM-SHA256", "ECDHE-ECDSA-AES256-GCM-SHA384", "ECDHE-RSA-AES256-GCM-SHA384", "ECDHE-ECDSA-CHACHA20-POLY1305", "ECDHE-RSA-CHACHA20-POLY1305", "DHE-RSA-AES128-GCM-SHA256", "DHE-RSA-AES256-GCM-SHA384", "TLS_AES_128_GCM_SHA256", "TLS_AES_256_GCM_SHA384", "TLS_CHACHA20_POLY1305_SHA256" ], "minTLSVersion": "VersionTLS12" }

}
}

$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-default LoadBalancer 172.30.242.215 a777bc4ce4da740d99abdaa899bf8e88-1599963277.us-west-1.elb.amazonaws.com 80:30779/TCP,443:31713/TCP 13d
router-internal-default ClusterIP 172.30.233.135 <none> 80/TCP,443/TCP,1936/TCP 13d
router-internal-example-service-testing ClusterIP 172.30.86.100 <none> 80/TCP,443/TCP,1936/TCP 87m

After `IngressController` creation, it all looks as expected and for the Private `IngressController` we can see `router-internal-example-service-testing` Service.

$ oc create svc nodeport router-example-service-testing --tcp=80
service/router-example-service-testing created

Now we are creating a `kubernetes` service of type NodePort with the same naming scheme like the one created by the `IngressController`. So far so good and also no impact or similar with regards to functionality.

$ oc get pod -n openshift-ingress-operator
NAME READY STATUS RESTARTS AGE
ingress-operator-7d56fd784c-plwpj 2/2 Running 0 78m

$ oc delete pod ingress-operator-7d56fd784c-plwpj -n openshift-ingress-operator
pod "ingress-operator-7d56fd784c-plwpj" deleted

$ oc logs ingress-operator-7d56fd784c-7g48r -n openshift-ingress-operator -c ingress-operator
2022-02-14T12:03:26.624Z INFO operator.main ingress-operator/start.go:63 using operator namespace {"namespace": "openshift-ingress-operator"}
I0214 12:03:27.675884 1 request.go:668] Waited for 1.02063447s due to client-side throttling, not priority and fairness, request: GET:
https://172.30.0.1:443/apis/apps.openshift.io/v1?timeout=32s
2022-02-14T12:03:29.284Z INFO operator.main ingress-operator/start.go:63 registering Prometheus metrics for canary_controller
2022-02-14T12:03:29.284Z INFO operator.main ingress-operator/start.go:63 registering Prometheus metrics for ingress_controller
[...]
2022-02-14T12:03:33.119Z INFO operator.dns dns/controller.go:535 using region from operator config {"region name": "us-west-1"}
2022-02-14T12:03:33.417Z INFO operator.ingress_controller controller/controller.go:298 reconciling {"request": "openshift-ingress-operator/example-service-testing"}
2022-02-14T12:03:33.509Z INFO operator.ingress_controller ingress/load_balancer_service.go:190 deleted load balancer service {"namespace": "openshift-ingress", "name": "router-example-service-testing"}
[...]

When restarting the `ingress-operator` pod we can see that shortly after, the manual created `kubernetes` service of type NodePort is being removed. Looking through the code it looks related to
https://bugzilla.redhat.com/show_bug.cgi?id=1914127
but that should only target/focus on `kubernetes` Service of type Loadbalancer. But we can clearly see that this is happening for all `kubernetes` Service type if they are matching the pre-defined `IngressController` naming scheme.

As this is not expected and also the `kubernetes` Services don't have any owner reference to the `IngressController` created services, it's unexpected that does are being removed and thus this should be fixed.

OpenShift release version:

OpenShift Container Platform 4.9.15

Cluster Platform:

AWS but likely on other platform as well

How reproducible:

Always

Steps to Reproduce (in detail):
1. See the steps in the problem description

Actual results:

`kubernetes` services of any type and without owner reference to the `IngressController` are being removed by the `IngressController` if they have a specific naming scheme.

Expected results:

`kubernetes` services without `IngressController` reference should never be touched/modified/removed by the same as they may be required for 3rd party integration or similar.

Impact of the problem:

3rd party implementation broken after updating to OpenShift Container Platform 4.8 as some helper services were removed unexpected.

Additional info:

Check
https://bugzilla.redhat.com/show_bug.cgi?id=1914127
as this seems the change that introduced that behavior. Although this seems specific for `kubernetes` type LoadBalancer and we are therefore wondering why other services are in scope as well.

clones

OCPBUGS-1623 Bug 2054200 - Custom created services in openshift-ingress removed even though the services are not of type LoadBalancer

Closed

is blocked by

OCPBUGS-1623 Bug 2054200 - Custom created services in openshift-ingress removed even though the services are not of type LoadBalancer

Closed

links to

openshift/cluster-ingress-operator#831: [release-4.9] OCPBUGS-1624: Fix removing custom created service in openshift-ingress with same name

Assignee:: Grant Spence (Inactive)

Reporter:: Grant Spence (Inactive)

QA Contact:: Melvin Joseph

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022/09/21 10:53 PM

Updated:: 2022/10/25 7:03 PM

Resolved:: 2022/10/19 7:52 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates