-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.21
-
None
-
Quality / Stability / Reliability
-
False
-
-
3
-
None
-
None
-
None
-
Proposed
-
CORENET Sprint 279
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
In a HostedCluster configured with a custom networking config:
apiVersion: hypershift.openshift.io/v1beta1
kind: HostedCluster
metadata:
annotations:
hypershift.openshift.io/control-plane-operator-image: quay.io/jparrill/hypershift:OCPBUGS-59649-v67
spec:
....
....
....
operatorConfiguration:
clusterNetworkOperator:
disableMultiNetwork: false
ovnKubernetesConfig:
ipv4:
internalJoinSubnet: 100.99.0.0/16
internalTransitSwitchSubnet: 100.100.0.0/16
I'm using a concrete image build because this is a feature on going quay.io/jparrill/hypershift:OCPBUGS-59649-v67, make sure you use it during the reproduction of the issue.
Version-Release number of selected component (if applicable):
- 4.21.0-0.ci-2025-10-31-105038-test-ci-op-ts6w8gjy-latest
How reproducible:
Steps to Reproduce:
1. Create a hostedCluster in AWS using the hypershift CLI and using --render + --render-sensitive flags, generating a STDOUT manifest, use > to put it in a file
2. Edit the manifest and add the configuration set above
3. Create the cluster
4. Once finished, access the hostedCluster via kubeconfig (sample:
oc get secret -n clusters jparrill-hosted-admin-kubeconfig -o jsonpath='{.data.kubeconfig}'| base64 -d > /Users/jparrill/RedHat/RedHat_Engineering/hypershift/hosted_clusters/clusters-jparrill-hosted/kubeconfig
5. Create the additional pull secret to trigger the kubelet restart, you can use these files:
### create-additional-user-ps.sh if [[ -z $1 ]];then echo "give me a secret" exit 1 fi kubectl create secret generic additional-pull-secret \ --from-file=.dockerconfigjson=$1 \ --type=kubernetes.io/dockerconfigjson \ --namespace=kube-system ### dockerps-1 { "auths": { "docker.io": { "auth": "cGFkYWp1YW46ZGNrcl9wYXRfdnFWbTVxWGtRb2ZMbnJCZHFFYVlxSm9kQk1Z" } } } ## Then execute ./create-additional-user-ps.sh dockerps-1
6. This should trigger the reconciliation of globalPullSecret controller, and after that the restart of the kubelets at node level.
Actual results:
λ static oc get pod -n openshift-ovn-kubernetes NAME READY STATUS RESTARTS AGE ovnkube-node-2kgtm 7/8 CrashLoopBackOff 14 (160m ago) 3h27m ovnkube-node-dkjjf 7/8 CrashLoopBackOff 18 (161m ago) 3h27m λ static oc get pod -n openshift-multus NAME READY STATUS RESTARTS AGE multus-additional-cni-plugins-qwsxk 1/1 Running 0 3h37m multus-q6shp 0/1 CrashLoopBackOff 12 (161m ago) 3h37m network-metrics-daemon-tppdv 2/2 Running 0 3h37m multus-additional-cni-plugins-blr75 1/1 Running 0 3h36m multus-gfqt8 0/1 CrashLoopBackOff 16 (162m ago) 3h36m network-metrics-daemon-m7nx8 2/2 Running 0 3h36m
Expected results:
No crashloop pods
Additional info:
Slack thread: https://redhat-internal.slack.com/archives/CK1AE4ZCK/p1761899088565229
Affected Platforms:
Is it an internal CI failure, found during the development of a feature. This is the PR: https://github.com/openshift/hypershift/pull/6745.
- In there you can check the test called "TestCreateClusterCustomConfig"
- This is the artifacts folder, check the hostedcluster.tar filer: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_hypershift/6745/pull-ci-openshift-hypershift-main-e2e-aws/1984206058326855680/artifacts/e2e-aws/hypershift-aws-run-e2e-external/artifacts/TestCreateClusterCustomConfig/
If it is a CI failure:
- Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
- Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
- When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
NAME STATUS ROLES AGE VERSION ip-10-0-3-104.ec2.internal Ready worker 3h37m v1.34.1 ip-10-0-14-138.ec2.internal Ready worker 3h36m v1.34.1 λ static oc get pod -n openshift-ovn-kubernetes NAME READY STATUS RESTARTS AGE ovnkube-node-2kgtm 7/8 CrashLoopBackOff 14 (160m ago) 3h27m ovnkube-node-dkjjf 7/8 CrashLoopBackOff 18 (161m ago) 3h27m λ static oc get pod -n openshift-multus NAME READY STATUS RESTARTS AGE multus-additional-cni-plugins-qwsxk 1/1 Running 0 3h37m multus-q6shp 0/1 CrashLoopBackOff 12 (161m ago) 3h37m network-metrics-daemon-tppdv 2/2 Running 0 3h37m multus-additional-cni-plugins-blr75 1/1 Running 0 3h36m multus-gfqt8 0/1 CrashLoopBackOff 16 (162m ago) 3h36m network-metrics-daemon-m7nx8 2/2 Running 0 3h36m Error: 2025-10-31T12:07:29.491809789Z + exec /usr/bin/ovnkube --init-ovnkube-controller ip-10-0-14-138.ec2.internal --init-node ip-10-0-14-138.ec2.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 --inactivity-probe=180000 --gateway-mode shared --gateway-interface br-ex --metrics-bind-address 127.0.0.1:29103 --ovn-metrics-bind-address 127.0.0.1:29105 --metrics-enable-pprof --metrics-enable-config-duration --export-ovs-metrics --disable-snat-multiple-gws --enable-multi-network --enable-network-segmentation --enable-preconfigured-udn-addresses --enable-admin-network-policy --enable-multicast --zone ip-10-0-14-138.ec2.internal --enable-interconnect --acl-logging-rate-limit 20 --disable-forwarding --bootstrap-kubeconfig=/var/lib/kubelet/kubeconfig --cert-dir=/etc/ovn/ovnkube-node-certs --cert-duration=24h --gateway-v4-join-subnet 100.99.0.0/16 --gateway-v4-masquerade-subnet 169.254.0.0/17 --gateway-v6-masquerade-subnet fd69::/112 --cluster-manager-v4-transit-switch-subnet 100.100.0.0/16 --enable-egress-ip=true --enable-egress-firewall=true --enable-egress-qos=true --enable-egress-service=true --enable-multi-external-gateway=true 2025-10-31T12:07:29.523946921Z Incorrect Usage: flag provided but not defined: -cluster-manager-v4-transit-switch-subnet