-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.13.0
-
Quality / Stability / Reliability
-
False
-
-
1
-
Moderate
-
None
-
None
-
None
-
None
-
CMP Sprint 58
-
1
-
?
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Installs from pre-4.8 changes have a file-integrity-operator-metrics service with port 8383 and 8686. This service has an ownerReference to the FIO deployment. The 4.9+ release of FIO creates a new service with ports 8383 and 8585.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
file-integrity-operator-metrics ClusterIP 10.98.75.1 <none> 8383/TCP,8686/TCP 576d
metrics ClusterIP 10.98.34.3 <none> 8383/TCP,8585/TCP 374d
- name: cr-metrics
port: 8686
protocol: TCP
targetPort: 8686
This creates the corresponding SM
kind: ServiceMonitor
metadata:
creationTimestamp: "2021-04-26T19:12:54Z"
generation: 1
labels:
name: file-integrity-operator
name: file-integrity-operator-metrics
namespace: openshift-file-integrity
ownerReferences:
- apiVersion: v1
blockOwnerDeletion: true
controller: true
kind: Service
name: file-integrity-operator-metrics
uid: 4aa67dae-eb8e-4fd9-bdca-7466301216ce
resourceVersion: "910050436"
uid: d545fdf6-08fe-412d-8612-d1de802613f5
spec:
endpoints:
- bearerTokenSecret:
key: ""
port: http-metrics
- bearerTokenSecret:
key: ""
port: cr-metrics
namespaceSelector: {}
selector:
matchLabels:
name: file-integrity-operator
You can see the down is on cr-metrics port 8686
"labels": {
"endpoint": "cr-metrics",
"instance": "10.97.5.105:8686",
"job": "file-integrity-operator-metrics",
"namespace": "openshift-file-integrity",
"pod": "file-integrity-operator-69d6bffd64-28q4z",
"service": "file-integrity-operator-metrics"
},
"scrapePool": "serviceMonitor/openshift-file-integrity/file-integrity-operator-metrics/1",
"scrapeUrl": "http://10.97.5.105:8686/metrics",
"globalUrl": "http://10.97.5.105:8686/metrics",
"lastError": "Get \"http://10.97.5.105:8686/metrics\": dial tcp 10.97.5.105:8686: connect: connection refused",
"lastScrape": "2022-11-24T16:55:42.668150322Z",
"lastScrapeDuration": 0.000390758,
"health": "down",
"scrapeInterval": "30s",
"scrapeTimeout": "10s"
How reproducible:
Always on older installs which have been upgraded.
Steps to Reproduce:
1. Install a pre 4.8 release which creates the file-integrity-operator-metrics service 2. Upgrade to the latest release where it creates the metrics service 3. See target down on port 8686
Actual results:
TargrtDown on port 8686 SM
Expected results:
No TargetDown alerts. We should cleanup / remove that old service.
Additional info: