Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.13.z
Affects Version/s: 4.14.z
Component/s: Cloud Compute / Cluster Autoscaler
Labels:

Severity:
Critical
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Previously, the {azure-first} node controller container did not tolerate the `NoExecute` taint on nodes. This caused a condition where a node would be uninitialized. With this release, the node controller deployment receives an update to tolerate the `NoExecutre` taint, so that nodes can be properly initialized. (link:https://issues.redhat.com/browse/OCPBUGS-34556[*OCPBUGS-34556*])

Show
* Previously, the {azure-first} node controller container did not tolerate the `NoExecute` taint on nodes. This caused a condition where a node would be uninitialized. With this release, the node controller deployment receives an update to tolerate the `NoExecutre` taint, so that nodes can be properly initialized. (link: https://issues.redhat.com/browse/OCPBUGS-34556 [*OCPBUGS-34556*])
Release Note Type:
Bug Fix
Release Note Status:
In Progress
Target Version:

4.13.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This is a clone of issue OCPBUGS-34556. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-33547. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-33405. The following is the description of the original issue:
—
Description of problem:

After upgrading the cluster to 4.14.16 from 4.12.26 Cu got disk pressure in infranode3 so to resolve that Cu deleted the node and so machine-set automatically created a new machine which came to ready state and ran some important pods but Cu can't schedule more pods to it due to uninitialized taint.

--> The taint was not removed unless removed manually. 
--> According to the Cu the below shown labels were not added to the new node:
labels:
  - failure-domain.beta.kubernetes.io/zone: westeurope-3
  - node.kubernetes.io/instance-type: Standard_D16s_v3
  - failure-domain.beta.kubernetes.io/region: westeurope
  - beta.kubernetes.io/instance-type: Standard_D16s_v3
  - topology.kubernetes.io/region: westeurope
--> Also after analyzing more, Cu got to know that the new nodes do not get any public ip which means that the new virtual machine was not added to the gateway backend pool.
--> No changes were made in the machine-set after upgrading the cluster.
--> Cu has enough range of ips for his cluster.
--> The cluster is not configured with accelerated networking so the bug faced due to upgrade is not possible.
--> The newly added node is in "Ready" state but taint is still present.

Version-Release number of selected component (if applicable):

How reproducible:

NA

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Should be upgraded to 4.14.16

Expected results:

Additional info:

 When Cu added the VM to the default backend pool manually using "az" command and removed the taint manually, everything worked fine.

links to

openshift/cluster-cloud-controller-manager-operator#353: [release-4.13] OCPBUGS-35562: update azure and ash tolerations on node manager

openshift/cluster-cloud-controller-manager-operator#354: OCPBUGS-35562: update unit tests [release-4.13]

RHBA-2024:4484 OpenShift Container Platform 4.13.z bug fix update

Assignee:: Michael McCune

Reporter:: OpenShift Prow Bot

QA Contact:: Zhaohua Sun

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/06/17 12:39 PM

Updated:: 2024/07/17 8:05 PM

Resolved:: 2024/07/17 1:36 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates