Uploaded image for project: 'Network Observability'
  1. Network Observability
  2. NETOBSERV-1045

When node reboots, Network Observability takes a while before recovering.

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • None
    • None
    • None
    • False
    • None
    • False
    • NetObserv - Sprint 240, NetObserv - Sprint 241
    • Moderate

    Description

      When a node reboots, Network Observability takes a while before recovering.  The console plug-in continues to use the old certificate so it isn't able to access Loki gateway.  I've seen it take more than half an hour before recovering.

      A node may be rebooted intentionally.  For example, if you need to make a machine config change such as https://github.com/sustainable-computing-io/kepler/blob/main/manifests/config/cluster-prereqs/51-worker-kernel-devel.yaml, it will cause a reboot.

      Attachments

        1. failed_to_verify_cert.png
          165 kB
          Steven Lee
        2. netobserv_health-flows_dropped.png
          146 kB
          Steven Lee
        3. noo-unable_to_get_flows-error.png
          159 kB
          Steven Lee

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stlee@redhat.com Steven Lee
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: