Description of problem:
Trying to deploy an updated ipsec os extension (https://github.com/openshift/os/pull/1718) through machine config on a Prow cluster. Machine config to be deployed: kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 80-ipsec-worker-extensions spec: config: ignition: version: 3.2.0 systemd: units: - name: ipsecenabler.service enabled: true contents: | [Unit] Description=Enable ipsec service after os extension installation Before=kubelet.service [Service] Type=oneshot ExecStartPre=rm -f /etc/ipsec.d/cno.conf ExecStart=systemctl enable --now ipsec.service [Install] WantedBy=multi-user.target extensions: - ipsec But machine config pool goes into degraded state with following error. conditions: - lastTransitionTime: "2025-03-05T10:59:18Z" message: "" reason: "" status: "False" type: RenderDegraded - lastTransitionTime: "2025-03-05T11:28:03Z" message: "" reason: "" status: "False" type: Updated - lastTransitionTime: "2025-03-05T11:28:03Z" message: All nodes are updating to MachineConfig rendered-worker-b7f59013a3075262a236d4c058174c7a reason: "" status: "True" type: Updating - lastTransitionTime: "2025-03-05T11:31:40Z" message: 'Node ip-10-0-39-84.ec2.internal is reporting: "error running rpm-ostree update --install NetworkManager-libreswan --install libreswan --install openvswitch3.5-ipsec: error: Updating rpm-md repo ''rhel-9.6-baseos'': cannot update repo ''rhel-9.6-baseos'': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried; Last error: Curl error (6): Couldn''t resolve host name for http://base-4-19-rhel96.ocp.svc.cluster.local/rhel-9.6-baseos/repodata/repomd.xml [Could not resolve host: base-4-19-rhel96.ocp.svc.cluster.local]\n: exit status 1"' reason: 1 nodes are reporting degraded status on sync status: "True" type: NodeDegraded - lastTransitionTime: "2025-03-05T11:31:40Z" message: 'Node ip-10-0-39-84.ec2.internal is reporting: "error running rpm-ostree update --install NetworkManager-libreswan --install libreswan --install openvswitch3.5-ipsec: error: Updating rpm-md repo ''rhel-9.6-baseos'': cannot update repo ''rhel-9.6-baseos'': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried; Last error: Curl error (6): Couldn''t resolve host name for http://base-4-19-rhel96.ocp.svc.cluster.local/rhel-9.6-baseos/repodata/repomd.xml [Could not resolve host: base-4-19-rhel96.ocp.svc.cluster.local]\n: exit status 1"' reason: "" status: "True" type: Degraded machine config daemon container logs: I0305 11:30:52.634215 2580 image_manager_helper.go:92] Running captured: podman create --net=none --annotation=org.openshift.machineconfigoperator.pivot=true --name ostree-container-pivot-a8165000-2e81-4b29-a5f1-ff46b51e9e7a registry.build06.ci.openshift.org/ci-ln-s0lbbib/stable@sha256:5fc84b3d068f6bd8a219f2da667a83de8d544895e1f01173e17aee2c871493fa I0305 11:30:52.687303 2580 run.go:19] Running: nice -- ionice -c 3 podman cp e94be9d866e522cf0e83aa8db091297a65674935c6e071a3c26087790e13b58a:/ /run/mco-extensions/os-extensions-content-1869168287 I0305 11:31:01.355684 2580 update.go:2737] Running: chcon -R -t var_run_t /run/mco-extensions/os-extensions-content-1869168287 I0305 11:31:01.618636 2580 update.go:2737] Running: rpm-ostree cleanup -p Deployments unchanged. I0305 11:31:01.771836 2580 update.go:1836] Applying extensions : ["update" "--install" "NetworkManager-libreswan" "--install" "libreswan" "--install" "openvswitch3.5-ipsec"] I0305 11:31:01.771857 2580 update.go:2737] Running: rpm-ostree update --install NetworkManager-libreswan --install libreswan --install openvswitch3.5-ipsec Pulling manifest: ostree-unverified-registry:registry.build06.ci.openshift.org/ci-ln-s0lbbib/stable@sha256:cbf7220ecb4531d2bcba90b8f0af549b37c95a3e61b0efb6e65bab8a827bd811 Checking out tree 819387a...done Enabled rpm-md repositories: rhel-9.6-baseos rhel-9.6-appstream rhel-9.6-fast-datapath rhel-9.6-nfv rhel-9.6-highavailability rhel-9.6-server-ose-4.19 rhel-9.6-early-kernel rhel-9.4-baseos rhel-9.4-appstream rhel-9.4-fast-datapath rhel-9.4-nfv rhel-9.4-server-ose-4.19 coreos-extensions Updating metadata for 'rhel-9.6-baseos'...done I0305 11:31:05.750825 2580 update.go:2909] Rolling back applied changes to OS due to error: error running rpm-ostree update --install NetworkManager-libreswan --install libreswan --install openvswitch3.5-ipsec: error: Updating rpm-md repo 'rhel-9.6-baseos': cannot update repo 'rhel-9.6-baseos': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried; Last error: Curl error (6): Couldn't resolve host name for http://base-4-19-rhel96.ocp.svc.cluster.local/rhel-9.6-baseos/repodata/repomd.xml [Could not resolve host: base-4-19-rhel96.ocp.svc.cluster.local] : exit status 1 I0305 11:31:05.750846 2580 update.go:2737] Running: rpm-ostree cleanup -p Deployments unchanged. ... I0305 11:31:12.374944 2580 update.go:2272] Could not reset unit preset for zincati.service, skipping. (Error msg: error running preset on unit: Failed to preset unit: Unit file zincati.service does not exist. ) I0305 11:31:12.374964 2580 file_writers.go:294] Writing systemd unit "kubelet-cleanup.service" I0305 11:31:12.382898 2580 file_writers.go:307] Disabling systemd unit kubelet-cleanup.service before re-writing it I0305 11:31:13.100774 2580 update.go:2213] Enabled systemd units: [NetworkManager-clean-initrd-state.service aws-kubelet-nodename.service aws-kubelet-providerid.service firstboot-osupdate.target kubelet-auto-node-size.service kubelet.service machine-config-daemon-firstboot.service machine-config-daemon-pull.service nmstate-configuration.service node-valid-hostname.service openvswitch.service ovs-configuration.service ovsdb-server.service wait-for-primary-ip.service kubelet-cleanup.service] I0305 11:31:13.450782 2580 update.go:2224] Disabled systemd units [kubens.service nodeip-configuration.service] I0305 11:31:13.450814 2580 update.go:1980] Deleting stale data I0305 11:31:13.818779 2580 update.go:2235] Preset systemd unit "ipsecenabler.service" I0305 11:31:13.818934 2580 update.go:2143] Removed stale systemd unit "/etc/systemd/system/ipsecenabler.service" I0305 11:31:13.819001 2580 update.go:2813] Removing SIGTERM protection E0305 11:31:13.819068 2580 writer.go:226] Marking Degraded due to: "error running rpm-ostree update --install NetworkManager-libreswan --install libreswan --install openvswitch3.5-ipsec: error: Updating rpm-md repo 'rhel-9.6-baseos': cannot update repo 'rhel-9.6-baseos': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried; Last error: Curl error (6): Couldn't resolve host name for http://base-4-19-rhel96.ocp.svc.cluster.local/rhel-9.6-baseos/repodata/repomd.xml [Could not resolve host: base-4-19-rhel96.ocp.svc.cluster.local]\n: exit status 1" The worker node's ocp.repo yum repo file contains following mirror entries: sh-5.1# cat /etc/yum.repos.d/* [rhel-9.6-baseos] id = rhel-9.6-baseos name = rhel-9.6-baseos baseurl = http://base-4-19-rhel96.ocp.svc.cluster.local/rhel-9.6-baseos enabled = 1 gpgcheck = 0 [rhel-9.6-appstream] id = rhel-9.6-appstream name = rhel-9.6-appstream baseurl = http://base-4-19-rhel96.ocp.svc.cluster.local/rhel-9.6-appstream enabled = 1 gpgcheck = 0 [rhel-9.6-fast-datapath] id = rhel-9.6-fast-datapath name = rhel-9.6-fast-datapath baseurl = http://base-4-19-rhel96.ocp.svc.cluster.local/rhel-9.6-fast-datapath enabled = 1 gpgcheck = 0 [rhel-9.6-nfv] id = rhel-9.6-nfv name = rhel-9.6-nfv baseurl = http://base-4-19-rhel96.ocp.svc.cluster.local/rhel-9.6-nfv enabled = 1 gpgcheck = 0 [rhel-9.6-highavailability] id = rhel-9.6-highavailability name = rhel-9.6-highavailability baseurl = http://base-4-19-rhel96.ocp.svc.cluster.local/rhel-9.6-highavailability enabled = 1 gpgcheck = 0 [rhel-9.6-server-ose-4.19] id = rhel-9.6-server-ose-4.19 name = rhel-9.6-server-ose-4.19 baseurl = http://base-4-19-rhel96.ocp.svc.cluster.local/rhel-9.6-server-ose-4.19 enabled = 1 gpgcheck = 0 [rhel-9.6-early-kernel] id = rhel-9.6-early-kernel name = rhel-9.6-early-kernel baseurl = http://base-4-19-rhel96.ocp.svc.cluster.local/rhel-9.6-early-kernel enabled = 1 gpgcheck = 0 [rhel-9.4-baseos] id = rhel-9.4-baseos name = rhel-9.4-baseos baseurl = http://base-4-19-rhel94.ocp.svc.cluster.local/rhel-9.4-baseos enabled = 1 gpgcheck = 0 [rhel-9.4-appstream] id = rhel-9.4-appstream name = rhel-9.4-appstream baseurl = http://base-4-19-rhel94.ocp.svc.cluster.local/rhel-9.4-appstream enabled = 1 gpgcheck = 0 [rhel-9.4-fast-datapath] id = rhel-9.4-fast-datapath name = rhel-9.4-fast-datapath baseurl = http://base-4-19-rhel94.ocp.svc.cluster.local/rhel-9.4-fast-datapath enabled = 1 gpgcheck = 0 [rhel-9.4-nfv] id = rhel-9.4-nfv name = rhel-9.4-nfv baseurl = http://base-4-19-rhel94.ocp.svc.cluster.local/rhel-9.4-nfv enabled = 1 gpgcheck = 0 [rhel-9.4-server-ose-4.19] id = rhel-9.4-server-ose-4.19 name = rhel-9.4-server-ose-4.19 baseurl = http://base-4-19-rhel94.ocp.svc.cluster.local/rhel-9.4-server-ose-4.19 enabled = 1 gpgcheck = 0
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
- links to
-
RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update