Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-3175

[release-5.5] Vector healthcheck fails when forwarding logs to Cloudwatch

    XMLWordPrintable

Details

    • False
    • None
    • False
    • NEW
    • VERIFIED
    • Do Not Include (note: this means to exclude from release notes and errata)
    • Rejected
    • Log Collection - Sprint 226

    Description

      CLONED from v5.6 fix: https://issues.redhat.com/browse/LOG-3093

      -----------------------------------

      Version of components:

      Server Version: 4.11.0-0.nightly-2022-09-20-234850

      Kubernetes Version: v1.24.0+3882f8f

      cluster-logging.v5.6.0

      Description of the problem:

      When forwarding logs to Cloudwatch with Vector as collector, Vector's healthcheck fails with below error.

      2022-09-22T09:01:30.654268Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=DescribeLogGroups failed: InvalidParameterException: 1 validation error detected: Value '{{ group_name }}' at 'logGroupNamePrefix' failed to satisfy constraint: Member must satisfy regular expression pattern: [\.\-_/#A-Za-z0-9]+ component_kind="sink" component_type="aws_cloudwatch_logs" component_id=cw component_name=cw
      x Health check for "cw" failed 

      Steps to reproduce the issue:

      1 Deploy a OCP AWS cluster.

      2 Create secret for forwarding logs to Cloudwatch.

      export REGION=us-east-2
      
      export ACCESS_KEY_ID=$(oc get secret aws-creds -n kube-system -o json | jq -r '.data.aws_access_key_id'|base64 -d)
      export SECRET_ACCESS_KEY=$(oc get secret  aws-creds -n kube-system -o json |jq -r '.data.aws_secret_access_key'|base64 -d)
      
      oc -n openshift-logging create secret generic cw-secret \
      --from-literal=aws_access_key_id="${ACCESS_KEY_ID}" \
      --from-literal=aws_secret_access_key="${SECRET_ACCESS_KEY}" 

      3 Create ClusterLogForwarder instance.

      apiVersion: "logging.openshift.io/v1"
      kind: ClusterLogForwarder
      metadata:
        name: instance
        namespace: openshift-logging
      spec:
        outputs:
         - name: cw
           type: cloudwatch
           cloudwatch:
             groupBy: logType
             region: us-east-2
           secret:
              name: cw-secret
        pipelines:
          - name: all-logs
            inputRefs:
              - infrastructure
              - audit
              - application
            outputRefs:
              - cw 

      4 Create ClusterLogging instance.

      apiVersion: "logging.openshift.io/v1"
      kind: "ClusterLogging"
      metadata:
        name: "instance" 
        namespace: "openshift-logging"
      spec:
        managementState: "Managed"  
        collection:
          logs:
            type: "vector"  
            vector: {} 

      5 Check that the collector pods are running and logs are being sent to Cloudwatch. Run vector validate from the collector pod.

      $ oc rsh collector-q5d8f
      Defaulted container "collector" out of: collector, logfilesmetricexporter
      sh-4.4# vector validate /etc/vector/vector.toml 
      Loaded with warnings ["/etc/vector/vector.toml"]
      ------------------------------------------------
      ~ Transform "route_container_logs._unmatched" has no consumers2022-09-22T09:21:13.890401Z  INFO vector::sources::kubernetes_logs: Obtained Kubernetes Node name to collect logs for (self). self_node_name="ip-10-0-131-185.us-east-2.compute.internal"
      2022-09-22T09:21:13.904520Z  INFO vector::sources::kubernetes_logs: Excluding matching files. exclude_paths=["/var/log/pods/openshift-logging_collector-*/*/*.log", "/var/log/pods/openshift-logging_elasticsearch-*/*/*.log", "/var/log/pods/openshift-logging_kibana-*/*/*.log", "/var/log/pods/*/*/*.gz", "/var/log/pods/*/*/*.tmp"]
      2022-09-22T09:21:13.953417Z  WARN aws_smithy_client::builder: Retries require a `sleep_impl`, but none was passed into the builder. No retries will occur with the current configuration. If this was intentional, you can suppress this message with `Client::set_sleep_impl(None). Otherwise, unless you have a good reason to use the low-level service client API, consider using the `aws-config` crate to load a shared config from the environment, and construct a fluent client from that. If you need to use the low-level service client API, then pass in a sleep implementation to make timeouts and retry work.
      √ Component configuration
      2022-09-22T09:21:13.954030Z  INFO send_operation{operation="DescribeLogGroups" service="cloudwatchlogs"}:provide_credentials{provider=default_chain}: aws_config::meta::credentials::chain: provider in chain did not provide credentials provider=Environment context=environment variable not set
      2022-09-22T09:21:13.954106Z  INFO send_operation{operation="DescribeLogGroups" service="cloudwatchlogs"}:provide_credentials{provider=default_chain}: aws_config::meta::credentials::chain: provider in chain did not provide credentials provider=Profile context=No profiles were defined
      2022-09-22T09:21:13.954650Z  INFO send_operation{operation="DescribeLogGroups" service="cloudwatchlogs"}:provide_credentials{provider=default_chain}:send_operation{operation="AssumeRoleWithWebIdentity" service="sts"}: aws_http::auth: provider returned CredentialsNotLoaded, ignoring
      2022-09-22T09:21:14.071679Z  INFO send_operation{operation="DescribeLogGroups" service="cloudwatchlogs"}:provide_credentials{provider=default_chain}: aws_config::meta::credentials::chain: loaded credentials provider=WebIdentityToken
      2022-09-22T09:21:14.163378Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=DescribeLogGroups failed: InvalidParameterException: 1 validation error detected: Value '{{ group_name }}' at 'logGroupNamePrefix' failed to satisfy constraint: Member must satisfy regular expression pattern: [\.\-_/#A-Za-z0-9]+ component_kind="sink" component_type="aws_cloudwatch_logs" component_id=cw component_name=cw
      x Health check for "cw" failed
      2022-09-22T09:21:14.163521Z  INFO vector::topology::builder: Healthcheck: Passed.
      √ Health check "prometheus_output"
      sh-4.4#  

      Additional Notes:

      The healthcheck failure doesn't affect logs being sent to Cloudwatch.

       

      Attachments

        Activity

          People

            cahartma@redhat.com Casey Hartman
            rhn-support-ikanse Ishwar Kanse
            Ishwar Kanse Ishwar Kanse
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: