-
Bug
-
Resolution: Can't Do
-
Major
-
None
-
4.18.z
Description of problem:
The customer reported that in order for the BaremetalHost to be provisioned, both metal3 pods have to be running on the same node. ---oc get pods -A -owide| grep metal3 cluster-prod-edge-spoke1-dc-sin2 openshift-machine-api metal3-66f78c98bb-4gtqn 4/4 Running 0 31h 10.68.2.1 control01.itup-002.example.com <none> <none> openshift-machine-api metal3-baremetal-operator-9dc676f77-gn75t 1/1 Running 0 31h 172.16.0.72 control01.itup-002.example.com <none> <none> ------------------------------------------- From the infrastructure-operator pod, port 6388/tcp is not reaching the metal3-state service port. Because of this, the customer is unable to deploy a new spoke cluster. To fix the issue, a custom security group rule had to be added to the security group attached to the master nodes. ---ERROR --- {"level":"info","ts":1749121148.4973779,"logger":"provisioner.ironic","msg":"error caught while checking endpoint, will retry","host":"example~control02","endpoint":"https://metal3-state.openshift-machine-api.svc.cluster.local:6388/v1/","error":"Get \"https://metal3-state.openshift-machine-api.svc.cluster.local:6388/v1/\": dial tcp 172.30.34.248:6388: i/o timeout"} --- --- BEFORE ADDING SG RULE --- NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE metal3-state ClusterIP 172.30.73.28 <none> 6388/TCP,6180/TCP,6183/TCP 18h bash-5.1 ~ $ oc rsh infrastructure-operator-6c5698fffb-v8lmg sh-5.1$ curl https://172.30.73.28:6388 curl: (28) Failed to connect to 172.30.73.28 port 6388: Connection timed out ----------------------------- --- AFTER ADDING SG RULE --- sh-5.1$ curl https://172.30.73.28:6388 -kv * Trying 172.30.73.28:6388... * Connected to 172.30.73.28 (172.30.73.28) port 6388 (#0) . . . ---------------------------- So as the traffic from metal3-state is exposed in pods that run in host network context, I think that traffic is not allowed by default and needs to be added manually. Is this expected?
Version-Release number of selected component (if applicable):
How reproducible:
always
Steps to Reproduce:
1. 2. 3.
Actual results:
When metal3 pods are not running on the same node, the infraustrcture-operator is unable to connect to metal3 pods.
Expected results:
The infrastructure operator should able to connect metal3 pods.
Additional info: