Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-5533

Lokistack pods cannot be moved to infrastructure nodes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • Logging 5.9.0
    • Log Storage
    • False
    • None
    • False
    • NEW
    • NEW
    • Bug Fix
    • Hide

      1 ) Create Lokistack instance (without adding nodeSelector and Tolerations)

      2) After checking all pods are deployed properly, edit the Lokistack instance and add nodeSelector and Tolerations for a Loki component, in this example for the ingester pod:

      oc edit lokistack logging-loki 
      
      apiVersion: loki.grafana.com/v1
      kind: LokiStack
      metadata:
        name: logging-loki
        namespace: openshift-logging
      spec:
        size: 1x.demo
        storage:
          schemas:
          - version: v12
            effectiveDate: '2024-01-07'
          secret:
            name: logging-loki-s3
            type: s3
        storageClassName: gp3-csi
        template:
          ingester:
            nodeSelector:
              node-role.kubernetes.io/infra: ""
            tolerations:
            - effect: NoSchedule
              key: infra
              value: reserved
            - effect: NoExecute
              key: infra
              value: reserved 

      3) After adding the values, the pod will be in a Pending status with the following log:

      NAME                                           READY   STATUS    RESTARTS   AGE
      cluster-logging-operator-f564bfd86-7wkd7       1/1     Running   0          20h
      logging-loki-compactor-0                       1/1     Running   0          20m
      logging-loki-distributor-6d7f7bf99-lmbps       1/1     Running   0          20m
      logging-loki-gateway-67c8547b8b-4mbc8          2/2     Running   0          20m
      logging-loki-gateway-67c8547b8b-dffdp          2/2     Running   0          20m
      logging-loki-index-gateway-0                   1/1     Running   0          20m
      logging-loki-ingester-0                        0/1     Pending   0          20m
      logging-loki-querier-76675d47c5-khbdd          1/1     Running   0          20m
      logging-loki-query-frontend-775d84cb9b-d7bh7   1/1     Running   0          20m 
      oc get pod logging-loki-ingester-0 -o yaml
      
      status:
        conditions:
        - lastProbeTime: null
          lastTransitionTime: "2024-05-15T08:21:44Z"
          message: '0/6 nodes are available: 1 node(s) had volume node affinity conflict,
            2 node(s) had untolerated taint {node-role.kubernetes.io/infra: reserved}, 3 node(s) had untolerated
            taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available:
            6 Preemption is not helpful for scheduling..'
          reason: Unschedulable
          status: "False"
          type: PodScheduled
        phase: Pending
        qosClass: BestEffort 

      In the example, the cluster has two infrastructure nodes tainted  and labeled with:

       

      oc adm taint nodes -l node-role.kubernetes.io/infra node-role.kubernetes.io/infra=reserved:NoSchedule node-role.kubernetes.io/infra=reserved:NoExecute
      
      oc label node <node-name> node-role.kubernetes.io/infra=
      oc label node <node-name> node-role.kubernetes.io=infra
       

      Also in the node definition we can find the taint and the label added properly so the error "2 node(s) had untolerated taint {node-role.kubernetes.io/infra: reserved}" shouldn't appear.

       

       

      Show
      1 ) Create Lokistack instance (without adding nodeSelector and Tolerations) 2) After checking all pods are deployed properly, edit the Lokistack instance and add nodeSelector and Tolerations for a Loki component, in this example for the ingester pod: oc edit lokistack logging-loki apiVersion: loki.grafana.com/v1 kind: LokiStack metadata:   name: logging-loki   namespace: openshift-logging spec:   size: 1x.demo   storage:     schemas:     - version: v12       effectiveDate: '2024-01-07'     secret:       name: logging-loki-s3       type: s3   storageClassName: gp3-csi   template:     ingester:       nodeSelector:         node-role.kubernetes.io/infra: ""       tolerations:       - effect: NoSchedule         key: infra         value: reserved       - effect: NoExecute         key: infra         value: reserved 3) After adding the values, the pod will be in a Pending status with the following log: NAME                                           READY   STATUS    RESTARTS   AGE cluster-logging- operator -f564bfd86-7wkd7       1/1     Running   0          20h logging-loki-compactor-0                       1/1     Running   0          20m logging-loki-distributor-6d7f7bf99-lmbps       1/1     Running   0          20m logging-loki-gateway-67c8547b8b-4mbc8          2/2     Running   0          20m logging-loki-gateway-67c8547b8b-dffdp          2/2     Running   0          20m logging-loki-index-gateway-0                   1/1     Running   0          20m logging-loki-ingester-0                        0/1     Pending   0          20m logging-loki-querier-76675d47c5-khbdd          1/1     Running   0          20m logging-loki-query-frontend-775d84cb9b-d7bh7   1/1     Running   0          20m oc get pod logging-loki-ingester-0 -o yaml status:   conditions:   - lastProbeTime: null     lastTransitionTime: "2024-05-15T08:21:44Z"     message: '0/6 nodes are available: 1 node(s) had volume node affinity conflict,       2 node(s) had untolerated taint {node-role.kubernetes.io/infra: reserved}, 3 node(s) had untolerated       taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available:       6 Preemption is not helpful for scheduling..'     reason: Unschedulable     status: "False"     type: PodScheduled   phase: Pending   qosClass: BestEffort In the example, the cluster has two infrastructure nodes tainted  and labeled with:   oc adm taint nodes -l node-role.kubernetes.io/infra node-role.kubernetes.io/infra=reserved:NoSchedule node-role.kubernetes.io/infra=reserved:NoExecute oc label node <node-name> node-role.kubernetes.io/infra= oc label node <node-name> node-role.kubernetes.io=infra Also in the node definition we can find the taint and the label added properly so the error "2 node(s) had untolerated taint {node-role.kubernetes.io/infra: reserved}" shouldn't appear.    
    • Moderate

      Description of problem:

      After deploying Lokistack, editing the Loki instance adding nodeSelector and tolerations for sending the logs to infrastructure nodes, the pods stay in a Pending status.

      Version-Release number of selected component (if applicable):

      RHOL 5.9.

      Loki Operator 5.9 (also for network observability).

      Expected results:

      Move the Loki pods to infra nodes.

       

              Unassigned Unassigned
              acandelp Adrian Candel
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: