-
Epic
-
Resolution: Done
-
Normal
-
None
-
Kuryr: Logs improvements
-
Improvement
-
False
-
False
-
Done
-
0% To Do, 0% In Progress, 100% Done
-
Undefined
-
M
Seems like we're getting more and more false positives from customers that are caused by our logs. We need to "civilize" them again on various levels.
- We need to get rid of pyroute2 deprecation warning in kuryr-cni.
- Flask logs of 200 /metrics calls are spamming the logs on INFO level, that's awful.
- `WARNING urllib3.connectionpool [-] Connection pool is full, discarding connection: <openstack-API>: queue.Full` - that's a common false positive, we need to solve or suppress it because it's harmless.
- If LB is stuck in `PENDING_UPDATE` state we need to clearly log that it's Octavia fault and not some ambiguous `ResourceNotReady`.
- If we time out waiting for port to become `ACTIVE` we need to say that it's Neutron fault on CNI side. It'd best to make sure we return error string to CNI and that'll be shown in `oc describe pod`.
- We still do awful job on signalling what caused a healthcheck to fail. Clear reason should appear in both `oc describe` and logs.