Loading...

Type: Bug
Resolution: Not a Bug
Priority: Normal
Fix Version/s: None
Affects Version/s: Logging 5.3.z
Component/s: Log Storage
Labels:

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW

Sprint:
Logging (LogExp) - Sprint 216

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Running Openshift Container Platform 4 - Cluster Logging 5.3.5-20 with the below configuration.

$ oc get clusterlogging instance -n openshift-logging -o json
{
    "apiVersion": "logging.openshift.io/v1",
    "kind": "ClusterLogging",
    "metadata": {
        "creationTimestamp": "2022-03-11T12:23:26Z",
        "generation": 4,
        "name": "instance",
        "namespace": "openshift-logging",
        "resourceVersion": "46593023",
        "uid": "c8360f12-e8b8-4903-8497-fb8613e4d4ac"
    },
    "spec": {
        "collection": {
            "logs": {
                "fluentd": {},
                "type": "fluentd"
            }
        },
        "logStore": {
            "elasticsearch": {
                "nodeCount": 6,
                "redundancyPolicy": "MultipleRedundancy",
                "resources": {
                    "limits": {
                        "cpu": 6,
                        "memory": "28Gi"
                    },
                    "requests": {
                        "cpu": 6,
                        "memory": "16Gi"
                    }
                },
                "storage": {
                    "size": "400G",
                    "storageClassName": "gp2"
                }
            },
            "retentionPolicy": {
                "application": {
                    "maxAge": "30d"
                },
                "audit": {
                    "maxAge": "2h"
                },
                "infra": {
                    "maxAge": "10d"
                }
            },
            "type": "elasticsearch"
        },
        "managementState": "Managed",
        "visualization": {
            "kibana": {
                "proxy": {
                    "resources": {
                        "limits": {
                            "cpu": 1,
                            "memory": "512Mi"
                        },
                        "requests": {
                            "cpu": "100m",
                            "memory": "512Mi"
                        }
                    }
                },
                "replicas": 1,
                "resources": {
                    "limits": {
                        "cpu": 1,
                        "memory": "2Gi"
                    },
                    "requests": {
                        "cpu": "500m",
                        "memory": "2Gi"
                    }
                }
            },
            "type": "kibana"
        }
    },
    "status": {
        "clusterConditions": [
            {
                "lastTransitionTime": "2022-03-11T12:23:44Z",
                "status": "False",
                "type": "CollectorDeadEnd"
            },
            {
                "lastTransitionTime": "2022-03-11T12:23:32Z",
                "message": "curator is deprecated in favor of defining retention policy",
                "reason": "ResourceDeprecated",
                "status": "True",
                "type": "CuratorRemoved"
            }
        ],
        "collection": {
            "logs": {
                "fluentdStatus": {
                    "daemonSet": "collector",
                    "nodes": {
                        "collector-5ddxs": "X.eu-west-3.compute.internal",
                        "collector-7t6ht": "X.eu-west-3.compute.internal",
                        "collector-b2jp8": "X.eu-west-3.compute.internal",
                        "collector-bk6rw": "X.eu-west-3.compute.internal",
                        "collector-cmqwc": "X.eu-west-3.compute.internal",
                        "collector-hj9cz": "X.eu-west-3.compute.internal",
                        "collector-jgpzz": "X.eu-west-3.compute.internal",
                        "collector-m2gsz": "X.eu-west-3.compute.internal",
                        "collector-rmntl": "X.eu-west-3.compute.internal",
                        "collector-tvcrs": "X.eu-west-3.compute.internal",
                        "collector-zqb6n": "X.eu-west-3.compute.internal"
                    },
                    "pods": {
                        "failed": [],
                        "notReady": [],
                        "ready": [
                            "collector-5ddxs",
                            "collector-7t6ht",
                            "collector-b2jp8",
                            "collector-bk6rw",
                            "collector-cmqwc",
                            "collector-hj9cz",
                            "collector-jgpzz",
                            "collector-m2gsz",
                            "collector-rmntl",
                            "collector-tvcrs",
                            "collector-zqb6n"
                        ]
                    }
                }
            }
        },
        "curation": {},
        "logStore": {
            "elasticsearchStatus": [
                {
                    "cluster": {
                        "activePrimaryShards": 157,
                        "activeShards": 467,
                        "initializingShards": 0,
                        "numDataNodes": 6,
                        "numNodes": 6,
                        "pendingTasks": 0,
                        "relocatingShards": 0,
                        "status": "green",
                        "unassignedShards": 0
                    },
                    "clusterName": "elasticsearch",
                    "nodeConditions": {
                        "elasticsearch-cd-b742028q-1": [],
                        "elasticsearch-cd-b742028q-2": [],
                        "elasticsearch-cd-b742028q-3": [],
                        "elasticsearch-cdm-dr5igezq-1": [],
                        "elasticsearch-cdm-dr5igezq-2": [],
                        "elasticsearch-cdm-dr5igezq-3": []
                    },
                    "nodeCount": 6,
                    "pods": {
                        "client": {
                            "failed": [],
                            "notReady": [],
                            "ready": [
                                "elasticsearch-cd-b742028q-1-788cf68686-vn2ss",
                                "elasticsearch-cd-b742028q-2-6f94877bf-vmmkw",
                                "elasticsearch-cd-b742028q-3-79c6bb444d-mv92n",
                                "elasticsearch-cdm-dr5igezq-1-759d4b84b7-qktqw",
                                "elasticsearch-cdm-dr5igezq-2-6b8cfbf6fd-x4kv9",
                                "elasticsearch-cdm-dr5igezq-3-68576d95df-d6hjq"
                            ]
                        },
                        "data": {
                            "failed": [],
                            "notReady": [],
                            "ready": [
                                "elasticsearch-cd-b742028q-1-788cf68686-vn2ss",
                                "elasticsearch-cd-b742028q-2-6f94877bf-vmmkw",
                                "elasticsearch-cd-b742028q-3-79c6bb444d-mv92n",
                                "elasticsearch-cdm-dr5igezq-1-759d4b84b7-qktqw",
                                "elasticsearch-cdm-dr5igezq-2-6b8cfbf6fd-x4kv9",
                                "elasticsearch-cdm-dr5igezq-3-68576d95df-d6hjq"
                            ]
                        },
                        "master": {
                            "failed": [],
                            "notReady": [],
                            "ready": [
                                "elasticsearch-cdm-dr5igezq-1-759d4b84b7-qktqw",
                                "elasticsearch-cdm-dr5igezq-2-6b8cfbf6fd-x4kv9",
                                "elasticsearch-cdm-dr5igezq-3-68576d95df-d6hjq"
                            ]
                        }
                    },
                    "shardAllocationEnabled": "all"
                }
            ]
        },
        "visualization": {
            "kibanaStatus": [
                {
                    "deployment": "kibana",
                    "pods": {
                        "failed": [],
                        "notReady": [],
                        "ready": [
                            "kibana-6558856cc5-z7fqc"
                        ]
                    },
                    "replicaSets": [
                        "kibana-6558856cc5"
                    ],
                    "replicas": 1
                }
            ]
        }
    }
}

The redundancyPolicy is set to MultipleRedundancy to have some fault tolerance and therefore the possibility to loss some elasticsearch members before the stack becomes unusable (as we should have replicas available.

When checking the indices, the following is found:

> $ es_util --query="_cat/indices?pretty&v" | grep app
> health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
> green  open   app-000005                H2aTFI-dS3Km4K0vTbEyPg   3   2     278885            0    719.9mb        239.9mb
> green  open   app-000004                STcVgd8yRd2zjTgK2JPB7g   3   2          0            0      2.2kb           783b
> green  open   app-000011                A84NnF9hRNKoAwnOZkLxOA   3   2     273220            0    703.2mb        234.4mb
> green  open   app-000013                4JMO3BByRHOO0CYoLsQ2HQ   5   2     133185            0    354.4mb        114.6mb
> green  open   app-000008                5cqqe2IoTiC-Rc6YTq6OJA   3   2     276628            0    712.2mb        237.4mb
> green  open   app-000012                lm5WNEoQRU2EC12vgfwDdg   3   2     104592            0      223mb         74.3mb
> green  open   app-000002                3ek9gM1DQHSikY_HzX8uHQ   3   2          0            0      2.2kb           783b
> green  open   app-000003                cy6uu_DcRYOxeBYS8-vSFw   3   2        135            0      1.1mb        395.1kb
> green  open   app-000001                Zc3cEoArQO-dFUzhO-XbLA   3   2         11            0    178.7kb         59.5kb
> green  open   app-000006                LwoMQw-TSvGuqxfxTZVtZg   3   2     274707            0    705.8mb        235.2mb
> green  open   app-000010                4vikLZL6RRuXxS3UHKszvQ   3   2     280188            0    722.4mb        240.8mb
> green  open   app-000009                -L1ASb2wTNqWwxK61jkhrw   3   2     277961            0    716.7mb        238.9mb
> green  open   app-000007                Jbk-N4KEQqG1Nj5mMsFSrw   3   2     279945            0    721.8mb        240.6mb

> $ es_util --query="_cat/indices?pretty&v" | grep infra
> health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
> green  open   infra-000288              by-7rVvvR_y7Logd_ywJ8g   3   2    1414527            0      2.7gb        932.5mb
> green  open   infra-000287              ZvNwwbPERu-tNQewMlLS8A   3   2    1397456            0      2.6gb        919.4mb
> green  open   infra-000270              OzAGsKmdR4WlcMklUzT0WQ   3   2     173599            0    343.9mb        114.6mb
> green  open   infra-000280              5l8RPnKJTfOhIsh_PUH2dw   3   2    1397862            0      2.6gb        908.2mb
> green  open   infra-000273              qMFkjXdpRySoOwk6Nv0Rgg   3   2    1315299            0      2.5gb        871.8mb
> green  open   infra-000276              pjwUY_3STdCqLZpEQLM7Jg   3   2    1453535            0      2.7gb        945.1mb
> green  open   infra-000283              Ojsj4YIjQAuF0qH5DemHOw   3   2    1391758            0      2.6gb        904.2mb
> green  open   infra-000286              7e2FvkAnRvKbkYqpuEiwWw   3   2    1624675            0      3.1gb            1gb
> green  open   infra-000271              uDPTy3SqR7KmQZLdRbuUNg   3   2    1292893            0      2.4gb        842.8mb
> green  open   infra-000277              ULK-eScaTF-JxqewbDJRYQ   3   2    1426813            0      2.7gb        930.5mb
> green  open   infra-000284              ANdfpAEGRyylBeWzuStmgw   3   2    1407210            0      2.6gb          917mb
> green  open   infra-000282              5XZPxbn1Ts-eowaG07MyyQ   3   2    1357182            0      2.5gb          883mb
> green  open   infra-000275              QzMlapa8RmyKC6mLDTwZgQ   3   2    1452245            0      2.7gb        943.5mb
> green  open   infra-000279              ej7smbMUQ3ubTtKxtdMlmw   3   2    1445210            0      2.7gb        943.2mb
> green  open   infra-000281              M7XmSXrYSBqDpBvZRwzSfw   3   2    1396606            0      2.6gb        909.2mb
> green  open   infra-000274              MhGKe1dlRe6u1fOdyXsGnw   3   2    1787044            0      3.1gb            1gb
> green  open   infra-000272              bUfhtkgCSCCV8fH89VRutg   3   2    1773337            0      3.2gb            1gb
> green  open   infra-000285              B-pqcuzaTIySAOIkTCXhDA   3   2    1391102            0      2.6gb        906.5mb
> green  open   infra-000278              Wu6mJxx8Q4eDFtMFWYaKlw   3   2    1442864            0      2.7gb        940.7mb
> green  open   infra-000289              68tTlu1gQ0mRuXPP0dS-BQ   5   2    2257040            0      4.4gb          1.4gb
> green  open   infra-000290              u9eLPuxISEmR4AJDO6lmzA   5   2     581765            0      1.1gb        408.3mb

As we can see, we have the expected number of primary shards available and the respective number of replicas so that we can survive an outage of a elasticsearch node.

But when checking the kibana and user indices, we can see that the redundancyPolicy is not applied there, meaning we are loosing the complete fault tolerance.

> $ es_util --query="_cat/indices?pretty&v" | grep -v audit | grep -v app | grep -v infra
> health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
> green  open   .kibana_111578566_user1_1 VB-xTC0qSeCR6T6DSSHW6g   1   1          1            0      7.4kb          3.7kb
> green  open   .kibana_1                 Cm9Hr9p1T-2G-S4Nvv54Eg   1   1          0            0       522b           261b
> green  open   .security                 YtJCu7FNSB-_e8AYABnrWA   1   1          6            2     61.9kb         30.9kb
> green  open   .kibana_111578567_user2_1 VU8UDs3gTrS--S2l-PKx9A   1   1          1            0      7.4kb          3.7kb

It's not clear whether this is expected behavior or not and if it's intended behavior, why this is, as this breaks the complete fault tolerance setup.

links to

Certain indices of Cluster Logging in OpenShift Container Platform 4 don't comply with the redundancyPolicy set

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates