Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-8068

Vector can't access log stores outside the cluster when restrict network policy is enabled and the cluster has cluster-wide proxy.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • Logging 6.4.0
    • Logging 6.4.0
    • Log Collection
    • None
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • False
    • NEW
    • NEW
    • Logging - Sprint 279

      Description of problem:

      When there is a cluster-wide proxy on the cluster and a restrict network policy is added to the collector, the vector cannot access log stores outside the cluster.

      Proxy example:

      apiVersion: config.openshift.io/v1
      kind: Proxy
      metadata:
        creationTimestamp: "2025-10-30T00:16:49Z"
        generation: 1
        name: cluster
        resourceVersion: "562"
        uid: a2e4767c-ec41-40c3-b6c5-3e75379b7d08
      spec:
        httpProxy: http://proxy-user2:fake@10.0.0.2:3128
        httpsProxy: http://proxy-user2:fake@10.0.0.2:3128
        noProxy: test.no-proxy.com
        trustedCA:
          name: ""
      status:
        httpProxy: http://proxy-user2:fake@10.0.0.2:3128
        httpsProxy: http://proxy-user2:fake@10.0.0.2:3128
        noProxy: .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qitang.test.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com 

      Vector pod:

      oc get pod clf-74947-48q59 -oyaml | yq '.spec.containers[0].env'
      - name: COLLECTOR_CONF_HASH
        value: f3ad218fe77e297af49d30b10a8959c9
      - name: K8S_NODE_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: spec.nodeName
      - name: NODE_IPV4
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: status.hostIP
      - name: OPENSHIFT_CLUSTER_ID
        value: cc4c8df8-2258-489d-87ae-fddd8ce120b3
      - name: POD_IP
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: status.podIP
      - name: POD_IPS
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: status.podIPs
      - name: https_proxy
        value: http://proxy-user2:fake@10.0.0.2:3128
      - name: http_proxy
        value: http://proxy-user2:fake@10.0.0.2:3128
      - name: no_proxy
        value: elasticsearch,.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qitang.test.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com
      - name: TRUSTED_CA_HASH
        value: 867df30db0268af97a2c0485e934076d
      - name: VECTOR_LOG
        value: warn
      - name: KUBERNETES_SERVICE_HOST
        value: kubernetes.default.svc
      - name: VECTOR_SELF_NODE_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: spec.nodeName 

      Vector pod log:

      2025-10-30T02:29:18.425906Z  WARN sink{component_kind="sink" component_id=output_es_created_by_user component_type=elasticsearch}: vector::sinks::util::retries: Request timed out. If this happens often while the events are actually reaching their destination, try decreasing `batch.max_bytes` and/or using `compression` if applicable. Alternatively `request.timeout_secs` can be increased. internal_log_rate_limit=true
      2025-10-30T02:29:18.427215Z  WARN sink{component_kind="sink" component_id=output_es_created_by_user component_type=elasticsearch}: vector::sinks::util::retries: Internal log [Request timed out. If this happens often while the events are actually reaching their destination, try decreasing `batch.max_bytes` and/or using `compression` if applicable. Alternatively `request.timeout_secs` can be increased.] is being suppressed to avoid flooding. 
      
      2025-10-30T02:40:18.988317Z  WARN sink{component_kind="sink" component_id=output_s3_output component_type=aws_s3}: vector::sinks::util::retries: Internal log [Retrying after error.] is being suppressed to avoid flooding.
      2025-10-30T02:40:35.046738Z  WARN sink{component_kind="sink" component_id=output_s3_output component_type=aws_s3}: vector::sinks::util::retries: Internal log [Retrying after error.] has been suppressed 1 times.
      2025-10-30T02:40:35.046761Z  WARN sink{component_kind="sink" component_id=output_s3_output component_type=aws_s3}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
      2025-10-30T02:40:58.743378Z  WARN sink{component_kind="sink" component_id=output_s3_output component_type=aws_s3}: vector::sinks::util::retries: Retrying after error. error=dispatch failure internal_log_rate_limit=true
      
      2025-10-30T02:40:18.870063Z  WARN sink{component_kind="sink" component_id=output_gcp_logging component_type=gcp_stackdriver_logs}: vector::sinks::util::retries: Request timed out. If this happens often while the events are actually reaching their destination, try decreasing `batch.max_bytes` and/or using `compression` if applicable. Alternatively `request.timeout_secs` can be increased. internal_log_rate_limit=true
      2025-10-30T02:41:28.264203Z  WARN sink{component_kind="sink" component_id=output_gcp_logging component_type=gcp_stackdriver_logs}: vector::sinks::util::retries: Request timed out. If this happens often while the events are actually reaching their destination, try decreasing `batch.max_bytes` and/or using `compression` if applicable. Alternatively `request.timeout_secs` can be increased. internal_log_rate_limit=true

      Version-Release number of selected component (if applicable):

      cluster-logging.v6.4.0

      How reproducible:

      Always

      Steps to Reproduce:

      1. Launch a cluster with cluster-wide proxy enabled
      2. Forward logs to log stores outside the cluster, e.g.: cloudwatch, s3, googlecloudlogging, elasticsearch and set networkPolicy.ruleSet to `RestrictIngressEgress`
      3. Check collector pods' log

      Actual results:

      Vector can't access log stores outside the cluster and raises timeout issue.

      Expected results:

      No such issue, vector can forward logs to log store.

      Additional info:

      No issue when the ruleSet is `AllowAllIngressEgress`. 

      No issue when cluster doesn't have proxy and ruleSet is `RestricIngressEgress`. 

              rh-ee-calee Calvin Lee
              qitang@redhat.com Qiaoling Tang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: