Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Minor
Fix Version/s: None
Affects Version/s: 4.17
Component/s: Bare Metal Hardware Provisioning / ironic
Labels:
- triaged

Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Previously, a check for unexpected IP addresses on the provisioning interface during metal3 pod startup was triggered. This issue occured because of the presence of an IP addresses supplied by DHCP from a previous version of the pod that existed on another node. With this release, a pod startup check now looks only for IP addresses that exist outside the provisioning network subnet, so that a metal3 pod starts immediately, even when if the node has moved to a different node. (link:https://issues.redhat.com/browse/OCPBUGS-38507[*~~OCPBUGS-38507~~*])

Show
* Previously, a check for unexpected IP addresses on the provisioning interface during metal3 pod startup was triggered. This issue occured because of the presence of an IP addresses supplied by DHCP from a previous version of the pod that existed on another node. With this release, a pod startup check now looks only for IP addresses that exist outside the provisioning network subnet, so that a metal3 pod starts immediately, even when if the node has moved to a different node. (link: https://issues.redhat.com/browse/OCPBUGS-38507 [* OCPBUGS-38507 *])
Release Note Type:
Bug Fix
Release Note Status:
Done
Target Version:

4.18.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Observing a CI test where the metal3 Pod is deleted and allowed to recreate on another host, it took 5 attempts to start the new pod because static-ip-manager was crashlooping with the following log:

+ '[' -z 172.22.0.3/24 ']'
+ '[' -z enp1s0 ']'
+ '[' -n enp1s0 ']'
++ ip -o addr show dev enp1s0 scope global
+ [[ -n 2: enp1s0    inet 172.22.0.134/24 brd 172.22.0.255 scope global dynamic noprefixroute enp1s0\       valid_lft 3sec preferred_lft 3sec ]]
+ ip -o addr show dev enp1s0 scope global
+ grep -q 172.22.0.3/24
ERROR: "enp1s0" is already set to ip address belong to different subset than "172.22.0.3/24"
+ echo 'ERROR: "enp1s0" is already set to ip address belong to different subset than "172.22.0.3/24"'
+ exit 1

The error message is misleading about what is actually checked (apart from the whole subnet/subset typo). It doesn't appear this should ever work for IPv4, since we don't ever expect the Provisioning VIP to appear on the interface before we've set it. (With IPv6 this should often work thanks to an appalling and unsafe hack. Not to suggest that grepping for an IPv4 address complete with .'s in it is safe either.)

Eventually the pod does start up, with this in the log:

+ '[' -z 172.22.0.3/24 ']'
+ '[' -z enp1s0 ']'
+ '[' -n enp1s0 ']'
++ ip -o addr show dev enp1s0 scope global
+ [[ -n '' ]]
+ /usr/sbin/ip address flush dev enp1s0 scope global
+ /usr/sbin/ip addr add 172.22.0.3/24 dev enp1s0 valid_lft 300 preferred_lft 300

So essentially this only worked because there are no IP addresses on the provisioning interface.

In the original (error) log the machine's IP 172.22.0.134/24 has a valid lifetime of 3s, so that likely explains why it later disappears. The provisioning network is managed, so the IP address comes from dnsmasq in the former incarnation of the metal3 pod. We effectively prevent the new pod from starting until the DHCP addresses have timed out, even though we will later flush them to ensure no stale ones are left behind.

The check was originally added by https://github.com/openshift/ironic-static-ip-manager/pull/27 but that only describes what it does and not the reason. There's no linked ticket to indicate what the purpose was.

blocks

OCPBUGS-49350 static IP manager crashloops for a while on pod startup

Closed

is cloned by

OCPBUGS-49350 static IP manager crashloops for a while on pod startup

Closed

is depended on by

OCPBUGS-48754 [OCP 4.16] static IP manager crashloops - backport of OCPBUGS-38507 to 4.16

Closed

is duplicated by

OCPBUGS-39314 Excessive Restarts on container/metal3-static-ip-set

Closed

links to

openshift/ironic-static-ip-manager#45: OCPBUGS-38507: Fix subnet validation

RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update

(1 links to)

Assignee:: Zane Bitter

Reporter:: Zane Bitter

QA Contact:: Steeve Goveas

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024/08/15 1:21 AM

Updated:: 2025/02/25 4:46 AM

Resolved:: 2025/02/25 4:46 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates