Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.12.z
Component/s: RHCOS
Labels:
- machine-config
- mco
- ostree
- upgrade

Regression:
No
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

The infra MCP is degraded due to one of the infra node unable to upgrade due to below issue:

2023-07-20T05:06:55.045058094Z I0720 05:06:55.045011    2786 update.go:2118] Disk currentConfig rendered-infra-c6d6928bfcd10ab1b440f6a2505bd5d1 overrides node's currentConfig annotation rendered-infra-76583762333a6685c3d4d1b75e14c28b
2023-07-20T05:06:55.048306566Z I0720 05:06:55.048269    2786 daemon.go:1564] Validating against pending config rendered-infra-c6d6928bfcd10ab1b440f6a2505bd5d1
2023-07-20T05:06:57.733681234Z E0720 05:06:57.733641    2786 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-infra-c6d6928bfcd10ab1b440f6a2505bd5d1: expected target osImageURL "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5ef4276442c5174d31f6b62a83aa40e64c719275dd731e5ccb0dc98911f7e57e", have "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fb065c8d91453ce4a3f5518189b34bce94406c01f43957abde01f08165b3a085" ("1ad911e70b7befaad4f3eac5ee14510bbaaecbedb9fb464ffbe3cb38e133576f")

Below are ostree-finalize-staged.service logs, we can see that there is a timeout after 20 minutes of copying:

journalctl_--no-pager_--unit_ostree-finalize-staged
Jul 19 15:22:23 SOINR01CAL0101.raiffeisen.org ostree[372060]: Copying /etc changes: 19 modified, 0 removed, 212 added
Jul 19 15:42:21 SOINR01CAL0101.raiffeisen.org systemd[1]: ostree-finalize-staged.service: Stopping timed out. Terminating.

The ostree-finalize-staged.service timeout is already set to 20 min in the RHCOS node.`

$ cat etc/systemd/system/ostree-finalize-staged.service.d/override.conf
[Service]
TimeoutStopSec=20m

$ cat rpm-ostree_status_-v 
State: idle 
Warning: failed to finalize previous deployment   
         check `journalctl -b -1 -u ostree-finalize-staged.service` AutomaticUpdates: disabled 
Deployments: ● ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fb065c8d91453ce4a3f5518189b34bce94406c01f43957abde01f08165b3a085 (index: 0)             
              Digest: sha256:fb065c8d91453ce4a3f5518189b34bce94406c01f43957abde01f08165b3a085                   Version: 412.86.202306271602-0 (2023-07-14T15:33:47Z)                       Commit: 1ad911e70b7befaad4f3eac5ee14510bbaaecbedb9fb464ffbe3cb38e133576f                            Staged: no 
               StateRoot: rhcos

Additional info:

Everytime when a minor upgrade is triggered for example from 4.12.20 to 4.12.21, 4.12.21 to 4.12.22 and 4.12.23 to 4.12.24. Only the infra nodes getting into the degraded state.

A simple MCP upgrade, like an update on a machine config for NTP, does not bring the node to a degraded state.

Assignee:: Colin Walters

Reporter:: Divyam Pateriya

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/07/21 1:40 PM

Updated:: 2023/08/31 9:49 PM

Resolved:: 2023/08/31 9:49 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates