k8s-infra otelDeployment and otelAgent configuration

The k8s-infra chart runs two kind of otel collectors:

otelAgent: This collector is deployed as a daemonset and collects metrics/logs from the pods on the node. One agent is deployed per node.
otelDeployment: This collector is deployed as a deployment and collects metrics/logs of the cluster. One deployment is deployed per cluster.

otelDeployment

The otelDeployment collector is configured to collect cluster metrics and events by default. There are some use cases where you might want to collect metrics from components other than the ones that are being collected by default. For example, you might want to collect metrics from a redis server or a postgres database. This page describes how to override the default configuration of the otelDeployment collector to collect metrics from additional components. The override-values.yaml file contains the configurations that you can override.

Example 1: Configuring a redis receiver running inside a namespace "redis-ns" with the name "redis-server" and port 6379

The following configuration shows how to configure the otelDeployment collector to collect metrics from a redis server running inside a namespace "redis-ns" with the name "redis-server" and port 6379.

otelDeployment:
  config:
    receivers:
      redis:
        endpoint: redis-server.redis-ns.svc.cluster.local:6379
    service:
      pipelines:
        metrics/internal:
          receivers: [redis]
          processors: [batch]
          exporters: []

Example 2: Configuring a postgres receiver running inside a namespace "postgres-ns" with the name "postgres-server" and port 5432

otelDeployment:
  config:
    postgres:
      endpoint: postgres-server.postgres-ns.svc.cluster.local:5432
    service:
      pipelines:
        metrics/internal:
          receivers: [postgres]
          processors: [batch]
          exporters: []

Example 3: Configuring a redis receiver running inside a namespace "redis-ns" with the name "redis-server" and port 6379 and a postgres receiver running inside a namespace "postgres-ns" with the name "postgres-server" and port 5432

otelDeployment:
  config:
    receivers:
      redis:
        endpoint: redis-server.redis-ns.svc.cluster.local:6379
      postgres:
        endpoint: postgres-server.postgres-ns.svc.cluster.local:5432
    service:
      pipelines:
        metrics/internal:
          receivers: [redis, postgres]
          processors: [batch]
          exporters: []

The above examples are simple and show how to configure the otelDeployment collector to collect metrics from a single component. However, in a real-world scenario, you might have to configure the receiver with a username and password and other configurations.

otelAgent

The otelAgent collector is deployed as a daemonset, it results in duplicate metrics if the same component is running on multiple nodes and the metrics are collected from all the nodes. The otelDeployment collector solves this problem by collecting metrics from the cluster and not from the individual nodes.

Essential Presets

This guide explains how to configure the OpenTelemetry Collector presets in SigNoz. These presets control what telemetry data you collect and how you collect it from your Kubernetes cluster.

1. OTLP Exporter

presets:
  otlpExporter:
    enabled: true  # Must be enabled for data to reach SigNoz

What it does: Enables sending telemetry data to SigNoz backend. This is required for SigNoz to receive any data.

2. Host Metrics Collection

presets:
  hostMetrics:
    enabled: true
    collectionInterval: 30s  # Adjust based on your monitoring needs
    scrapers:
      cpu: {}        # Collects CPU usage, time, and utilization metrics
      load: {}       # Collects system load averages
      memory: {}     # Collects memory usage metrics
      disk:          # Collects disk I/O metrics
        exclude:
          devices:   # Regex patterns to exclude specific devices
            - ^ram\d+$
            - ^loop\d+$
            # Add custom device exclusions here
      filesystem:    # Collects filesystem metrics
        exclude_fs_types:
          fs_types:
            - tmpfs
            - squashfs
            # Add filesystem types to exclude
      network:       # Collects network metrics
        exclude:
          interfaces:
            - ^veth.*$    # Excludes virtual ethernet devices
            - ^docker.*$  # Excludes docker interfaces
            # Add network interfaces to exclude

Key points:

Adjust collectionInterval based on your monitoring granularity needs
Customize device exclusions to reduce noise from irrelevant storage devices
Configure network interface exclusions to focus on relevant network metrics

3. Container Log Collection

presets:
  logsCollection:
    enabled: true
    startAt: beginning      # Options: beginning or end
    includeFilePath: true   # Adds file path as metadata
    includeFileName: false  # Adds file name as metadata
    
    # Exclusion configuration
    blacklist:
      enabled: true
      signozLogs: true     # Excludes SigNoz's own logs
      namespaces:
        - kube-system      # Add namespaces to exclude
      pods:
        - hotrod          # Add pod names to exclude
        - locust
      containers: []       # Add container names to exclude
    
    # Inclusion configuration (overrides blacklist if enabled)
    whitelist:
      enabled: false
      signozLogs: true
      namespaces: []      # Only collect logs from these namespaces
      pods: []            # Only collect logs from these pods
      containers: []      # Only collect logs from these containers

Best practices:

Use blacklist for excluding noisy system pods
Enable whitelist when you need to monitor specific applications only
Consider disk usage implications when enabling file path/name inclusion

4. Kubernetes Metrics Collection

presets:
  kubeletMetrics:
    enabled: true
    collectionInterval: 30s
    authType: serviceAccount    # Authentication method
    endpoint: ${env:K8S_HOST_IP}:10250
    insecureSkipVerify: true   # Set to false in production
    
    # Metrics configuration
    metrics:
      k8s.pod.cpu_limit_utilization:
        enabled: true
      k8s.pod.memory_limit_utilization:
        enabled: true
      # Enable other metrics as needed

Important settings:

Set appropriate collectionInterval based on cluster size
Configure insecureSkipVerify: false in production environments
Enable specific metrics based on monitoring requirements

5. Kubernetes Metadata Enrichment

presets:
  kubernetesAttributes:
    enabled: true
    passthrough: false    # Set true to disable k8s API calls
    
    # Control which metadata to collect
    extractMetadatas:
      - k8s.namespace.name
      - k8s.deployment.name
      - k8s.pod.name
      - k8s.node.name
      # Add or remove metadata fields
    
    # Pod association configuration
    podAssociation:
      - sources:
        - from: resource_attribute
          name: k8s.pod.ip

Configuration tips:

Enable only required metadata fields to optimize performance
Use passthrough: true in very large clusters to reduce API load
Configure pod association based on your networking setup

6. Cluster-level Metrics

presets:
  clusterMetrics:
    enabled: true
    collectionInterval: 30s
    
    # Node conditions to monitor
    nodeConditionsToReport:
      - Ready
      - MemoryPressure
      - DiskPressure
      # Add conditions based on monitoring needs
    
    # Resource types to monitor
    allocatableTypesToReport:
      - cpu
      - memory
      # - storage    # Uncomment if needed

When to use:

Enable for cluster health monitoring
Useful for capacity planning and resource optimization
Essential for multi-node cluster monitoring

7. Kubernetes Events

presets:
  k8sEvents:
    enabled: true
    authType: serviceAccount
    namespaces: []    # Empty for all namespaces, or specify list

Usage scenarios:

Enable for debugging and audit trails
Monitor specific namespaces by listing them
Useful for compliance and security monitoring

Common Deployment Scenarios

Production Cluster with Full Monitoring

presets:
  otlpExporter:
    enabled: true
  hostMetrics:
    enabled: true
    collectionInterval: 30s
  logsCollection:
    enabled: true
    blacklist:
      enabled: true
      namespaces: 
        - kube-system
  kubeletMetrics:
    enabled: true
    insecureSkipVerify: false
  kubernetesAttributes:
    enabled: true
  clusterMetrics:
    enabled: true
  k8sEvents:
    enabled: true

Resource-Constrained Environment

presets:
  otlpExporter:
    enabled: true
  hostMetrics:
    enabled: true
    collectionInterval: 60s
  logsCollection:
    enabled: true
    whitelist:
      enabled: true
      namespaces: 
        - production
  kubeletMetrics:
    enabled: true
    collectionInterval: 60s
  kubernetesAttributes:
    enabled: true
    passthrough: true

Development Environment

presets:
  otlpExporter:
    enabled: true
  hostMetrics:
    enabled: true
    collectionInterval: 30s
  logsCollection:
    enabled: true
  kubeletMetrics:
    enabled: true
  kubernetesAttributes:
    enabled: true