Uploaded image for project: 'Network Observability'
  1. Network Observability
  2. NETOBSERV-617

eBPF agent: Need to split huge GRPC payloads

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • netobserv-1.2
    • netobserv-ocp4.12
    • eBPF, FLP, Kafka
    • None
    • False
    • None
    • False
    • Hide
      Previously, for agents configured to send flows directly to the processor as GRPC+protobuf requests, under some very-high-load scenarios and with some configurations of the agent, the submitted payload could be too large and is rejected by the processors' GRPC server. The agent logged an error message, such as: _grpc: received message larger than max_. As a consequence, the information about some flows would be lost.
      With this patch, GRPC payload is split into several messages when the size exceeds a threshold. As a result, the connectivity is maintained.
      Show
      Previously, for agents configured to send flows directly to the processor as GRPC+protobuf requests, under some very-high-load scenarios and with some configurations of the agent, the submitted payload could be too large and is rejected by the processors' GRPC server. The agent logged an error message, such as: _grpc: received message larger than max_. As a consequence, the information about some flows would be lost. With this patch, GRPC payload is split into several messages when the size exceeds a threshold. As a result, the connectivity is maintained.
    • Known Issue
    • Done
    • NetObserv - Sprint 229, NetObserv - Sprint 230, NetObserv - Sprint 231, NetObserv - Sprint 232

      In some extreme situations (~50.000 flows/eviction per agent), the GRPC payload is too big and it's rejected by FLP:

      time="2022-10-04T10:12:21Z" level=error msg="couldn't send flow records to collector" collector="10.0.155.240:2055" component=exporter/GRPCProto error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4593533 vs. 4194304)"

      Even with smaller payloads, big messages (~30.000 flows) makes the memory of FLP to grow too much and get OOMKilled if its defined limits are low.

      Despite this use case would make us recommending the customer to move to Kafka, we'd anyway need to split the GPRC messages into smaller chunks that are configurable by the user (e.g. default 10.000 flows per GRPC invocation, which is equivalent to ~1MB of HTTP POST body)

            mmaciasl@redhat.com Mario Macias (Inactive)
            mmaciasl@redhat.com Mario Macias (Inactive)
            Mehul Modi Mehul Modi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: