Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: Logging 6.2.z
Affects Version/s: Logging 5.8.z, Logging 5.9.z, Logging 6.0.z, Logging 6.1.z, Logging 6.2.z, Logging 6.3.z
Component/s: Log Collection
Labels:
- cee.next
- devel_ack+

Activity Type:
Incidents & Support
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
[release-6.2] EUS Backports
Docs QE Status:
NEW
QE Status:
NEW
Release Note Text:

Hide
This fix specifically enables caching of kube API server calls and introduces a ClusterLogForwarder annotation to allow tuning of the collector rollout strategy. These changes allow administrators managing clusters with large numbers of nodes to modify the collector upgrade behavior to not overwhelm the Kubernetes API server with requests. This can be accomplished by reducing the number of MaxUnavailable collectors during upgrade.

Show
This fix specifically enables caching of kube API server calls and introduces a ClusterLogForwarder annotation to allow tuning of the collector rollout strategy. These changes allow administrators managing clusters with large numbers of nodes to modify the collector upgrade behavior to not overwhelm the Kubernetes API server with requests. This can be accomplished by reducing the number of MaxUnavailable collectors during upgrade.
Release Note Type:
Bug Fix
Intelligence Requested:
Market:

Sprint:
Logging - Sprint 280
Severity:
Critical
Customer Impact:

Customer Escalated

Target Version:

Logging 6.3.0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

With the collector pods (Vector) restart, the control-plane is impacted going to unavailable as the number of requests to the API is highly increased for the requests received by the collectors when restarted. Better visibility is in the next dashboards:

Api requests:

Cpu and memory usage in the control plane:

Version-Release number of selected component (if applicable):

How reproducible:

Every time that the collector pods are restarted manually or by the Logging Operator for applying any change

Some information:

$ oc get no --no-headers |wc -l
40
$ oc get po A  -no-headers|wc -l
4162
$ oc get ns  --no-headers|wc -l
473

Number of "inputs" in the clusterLogForwarder: 38

Steps to Reproduce:

In the environments affected, every time that restarted, the KubeAPI is impacted

Actual results:

The KubeAPI returns timeouts

Expected results:

The KubeAPI and control plane work normally and the restart of the collector pods don't impact in the control-planes (KubeAPI)

Data needed to collect:

number of pods "oc get pods -A|wc -l"
number of namespace "oc get ns|wc -l"
Logging Operator version "oc get csv |grep -i logging"
clusterLogForwarder
Dashboard avaiable in "OpenShift Console > Observe > Dashboards > Dashboard: OpenShift Logging / Collection"

Possible workaround to test until the RCA is not found and resolved:

Move to Unmanaged the clusterLogForwarder CR. This will avoid to be restarted all the collector pods by the operator when a change in the Logging configuration is applied, but it will avoid:

Update the Logging stack as the Operator won't consider to manage the resources
Any update in the configuration won't be applied as the Logging Operator is not managing the resources

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2025-05-27-10-18-49-213.png
169 kB
2025/08/22 8:36 AM
screenshot-1.png
442 kB
2025/08/22 8:36 AM

causes

LOG-7535 Implement GA of kube-api server caching and configurable daemonset rollout strategy

Closed

clones

LOG-7196 The collectors (Vector) restart impacts in the KubeAPI making it unavailable

Closed

is caused by

OBSDOCS-90 Starting in RHOL 5.5 the collectors are restarted all at the same time

Closed

relates to

LOG-7342 [release-6.2]Backport enabling kube-apiserver caching and maxUnavailable rollout strategy

Closed

links to

[KCS] The collectors (Vector) restart impacts in the KubeAPI in RHOCP 4

openshift/cluster-logging-operator#3134: LOG-7597: Enable kubeapi caching and annotation for maxUnavailable

(1 links to)

Assignee:: Jeffrey Cantrill

Reporter:: Oscar Casal Sanchez

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/08/22 8:36 AM

Updated:: 2025/11/17 1:43 PM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Data needed to collect:

Possible workaround to test until the RCA is not found and resolved:

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates