Uploaded image for project: 'Distributed Tracing'
  1. Distributed Tracing
  2. TRACING-5963

[Upstream] Filesystem scraper not working in hostmetrics receiver.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • OpenTelemetry
    • None
    • Quality / Stability / Reliability
    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • Tracing Sprint # 285

      https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver 

       

      Summary:                                                                                                                     

      [hostmetrics receiver] Filesystem scraper produces no metrics.

      Component: OpenTelemetry Collector Contrib - hostmetricsreceiver

      Affects Version: 0.143.0, 0.144.0

      Description{}

      The hostmetrics receiver filesystem scraper stopped producing metrics in containerized environments (Kubernetes/OpenShift) after upgrading from collector version 0.142.0 to 0.143.0+.

      Possible Root Cause Analysis{}

      The regression might be caused by the gopsutil library upgrade from v4.25.11 to v4.25.12 introduced in collector version 0.143.0.

      Specifically, https://github.com/shirou/gopsutil/pull/1931 (which fixes https://github.com/shirou/gopsutil/issues/1284)

      introduced breaking changes in bind mount detection and filtering logic. This affects how /proc/1/mountinfo is parsed and how bind mounts (like /hostfs) are detected in containerized environments.

        │ Collector Version │ gopsutil Version │ Filesystem Metrics │

        │ 0.142.0           │ v4.25.11         │ Working            │

        │ 0.143.0+          │ v4.25.12         │ Not collected      │

      Filesystem Metrics are affected like below:

        - system.filesystem.inodes.usage

        - system.filesystem.usage

        Environment{}

        - Platform: OpenShift/Kubernetes

        - Deployment mode: DaemonSet

        - Configuration: root_path: /hostfs with host filesystem mounted at /hostfs

        Steps to Reproduce:

        1. Clone the distributed-tracing-qe test repository:

        git clone https://github.com/openshift/distributed-tracing-qe.git

        cd distributed-tracing-qe

        2. Ensure you have access to an OpenShift/Kubernetes cluster:

        export KUBECONFIG=/path/to/kubeconfig

        3. Uncomment the metrics in the check_logs.sh script in hostmetrics receiver test. 

         "system.filesystem.inodes.usage"

         "system.filesystem.usage"

        4. Run the hostmetrics receiver test with collector version 0.144.0:

        chainsaw test --skip-delete --test-dir tests/e2e-otel/hostmetricsreceiver/

        5. Observe that system.filesystem.inodes.usage and system.filesystem.usage metrics are not collected.

        6. To verify the issue, check collector logs for filesystem-related metrics:

        kubectl -n chainsaw-hostmetrics logs -l app.kubernetes.io/component=opentelemetry-collector | grep "Name: system.filesystem"

        7. No output is returned (filesystem metrics are missing).

        8. Compare with collector version 0.142.0 (working):

          - Update collector image to 0.142.0 in otel-hostmetricsreceiver.yaml

          - Re-run the test - filesystem metrics are collected successfully

        Expected Behavior{}

      The filesystem scraper should collect system.filesystem.inodes.usage and system.filesystem.usage metrics when the host filesystem is bind-mounted to /hostfs with root_path: /hostfs configuration.

      Actual Behavior{}

      The filesystem scraper produces no metrics. The filesystemscraper InstrumentationScope is completely absent from the collector output, while all other scrapers (cpu, disk, memory, network, paging, processes, process) work correctly.

      References:

        - gopsutil PR introducing the change: https://github.com/shirou/gopsutil/pull/1931

        - gopsutil issue fixed by the PR: https://github.com/shirou/gopsutil/issues/1284

        - Related OTEL collector issue (previously fixed, different regression):

        https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/35990

              agerstma@redhat.com Andreas Gerstmayr
              rhn-support-ikanse Ishwar Kanse
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: