Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

It has been observed that master 0 was unready for 4s in this job during conformance test: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn-upgrade/1674247083348987904

Slack thread for the context: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1688049633796099

The initial aggregated failure is about "clusteroperator/control-plane-machine-set should not change condition/Available": https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-azure-ovn-upgrade-4.14-micro-release-openshift-release-analysis-aggregator/1674247086704431104

But the clusteroperator was changing state due to master0 state change. The master node state change at that time:

   conditions:

   - lastHeartbeatTime: "2023-06-29T05:11:28Z"

- lastTransitionTime: "2023-06-29T04:35:36Z"

- message: kubelet has sufficient memory available

- reason: KubeletHasSufficientMemory

- status: "False"

+ lastTransitionTime: "2023-06-29T05:16:49Z"

+ message: Kubelet stopped posting node status.

+ reason: NodeStatusUnknown

+ status: Unknown

   type: MemoryPressure

   - lastHeartbeatTime: "2023-06-29T05:11:28Z"

- lastTransitionTime: "2023-06-29T04:35:36Z"

- message: kubelet has no disk pressure

- reason: KubeletHasNoDiskPressure

- status: "False"

+ lastTransitionTime: "2023-06-29T05:16:49Z"

+ message: Kubelet stopped posting node status.

+ reason: NodeStatusUnknown

+ status: Unknown

   type: DiskPressure

   - lastHeartbeatTime: "2023-06-29T05:11:28Z"

- lastTransitionTime: "2023-06-29T04:35:36Z"

- message: kubelet has sufficient PID available

- reason: KubeletHasSufficientPID

- status: "False"

+ lastTransitionTime: "2023-06-29T05:16:49Z"

+ message: Kubelet stopped posting node status.

+ reason: NodeStatusUnknown

+ status: Unknown

   type: PIDPressure

   - lastHeartbeatTime: "2023-06-29T05:11:28Z"

- lastTransitionTime: "2023-06-29T04:35:36Z"

- message: kubelet is posting ready status

- reason: KubeletReady

- status: "True"

+ lastTransitionTime: "2023-06-29T05:16:49Z"

+ message: Kubelet stopped posting node status.

+ reason: NodeStatusUnknown

+ status: Unknown

   type: Ready

Kubelet is complaining about api access around that time:

Jun 29 05:16:24.250797 ci-op-cyqgzj4w-ed5cd-ll5md-master-0 kubenswrapper[2336]: E0629 05:16:24.250754 2336 kubelet_node_status.go:567] "Error updating node status, will retry" err="error getting node \"ci-op-cyqgzj4w-ed5cd-ll5md-master-0\": Get \"https://api-int.ci-op-cyqgzj4w-ed5cd.ci2.azure.devcluster.openshift.com:6443/api/v1/nodes/ci-op-cyqgzj4w-ed5cd-ll5md-master-0?resourceVersion=0&timeout=10s
\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"

links to

openshift/origin#28020: trt-1117: Test for update lease errors

Assignee:: Unassigned

Reporter:: Ken Zhang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/06/29 3:54 PM

Updated:: 2023/07/03 1:20 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates

Hide