# Custom Metrics Scripts The easiest way to publish custom metrics without API calls or client libraries is to drop executable scripts into `/app/custom-metrics/`. Telegraf runs these scripts every 10 seconds and publishes their output directly to the Prometheus endpoint. ## How It Works 1. **Directory**: `/app/custom-metrics/` (inside the container) 2. **Interval**: Every 10 seconds, Telegraf executes all executable `*.sh` and `*.py` files 3. **Output**: Scripts print InfluxDB Line Protocol on stdout 4. **Result**: Metrics appear immediately on `:9273/metrics` (Prometheus format) and `/metrics/stream` (SSE) The directory is persistent (mounted as a Docker volume), so scripts survive container restarts. ## Script Requirements Each script must: 1. **Be executable**: `chmod +x your-script.sh` 2. **Print InfluxDB Line Protocol**: One metric per line on stdout 3. **Finish in <5 seconds**: Telegraf kills longer runs 4. **Produce clean output**: No debug prints, banners, or stderr 5. **Handle errors gracefully**: Non-zero exit codes do not crash Telegraf ## InfluxDB Line Protocol Format ``` measurement[,tag1=value1,tag2=value2] field1=value1[,field2=value2] [timestamp] ``` **Examples:** ```bash # Simple metric with no tags fan_speed rpm=2500i # Metric with tags cpu_temp,sensor=cpu0,location=socket1 temperature=65.5 1704067200000000000 # Multiple fields system_load,host=myhost load1=1.23,load5=1.45,load15=1.67 # Without timestamp (Telegraf auto-assigns) memory_usage,app=myapp used_mb=512i,total_mb=2048i ``` **Field types:** - Integer: append `i` (`count=42i`) - Float: no suffix (`temperature=65.5`) - String: wrap in quotes (`status="running"`) - Boolean: `t` or `f` (`enabled=t`) ## End-to-End Example: Fan RPM Metric ### Step 1: Start the Stack ```bash docker compose up -d ``` ### Step 2: Create the Script Create a shell script that reads fan RPM and outputs InfluxDB Line Protocol: ```bash docker exec metrics-manager sh -c 'cat > /app/custom-metrics/fan_rpm.sh << '"'"'EOF'"'"' #!/bin/sh # Read fan RPM from sysfs or use a simulation # Example: read from /sys/class/hwmon/hwmon0/fan1_input (replace with your path) rpm=$(awk "BEGIN{srand(); print int(2000+rand()*1000)}") echo "fan_speed,sensor=cpu_fan,location=main rpm=${rpm}i" EOF chmod +x /app/custom-metrics/fan_rpm.sh' ``` Or use a Python script: ```bash docker exec metrics-manager sh -c 'cat > /app/custom-metrics/fan_rpm.py << '"'"'EOF'"'"' #!/usr/bin/env python3 import random rpm = random.randint(2000, 3000) print(f"fan_speed,sensor=cpu_fan,location=main rpm={rpm}i") EOF chmod +x /app/custom-metrics/fan_rpm.py' ``` ### Step 3: Wait for the Next Telegraf Interval Telegraf executes scripts every 10 seconds. Wait ~10 seconds, then verify the metric appeared: ```bash curl -s http://localhost:9273/metrics | grep fan_speed # Output: fan_speed_rpm{location="main",sensor="cpu_fan",host="..."} 2374 ``` ### Step 4: Verify in SSE Stream The metric should appear in the live stream consumed by dashboards: ```bash curl -N -H "Accept: text/event-stream" http://localhost:9090/metrics/stream | grep fan_speed ``` ### Step 5: Persist the Script on the Host (Optional) Instead of inside the container's named volume, mount it from the host. Edit `compose.yaml`: ```yaml services: metrics-manager: volumes: - ./my-scripts:/app/custom-metrics # Replace default named volume ``` Then place your scripts in the local `./my-scripts/` directory and restart: ```bash docker compose down docker compose up -d ``` ## Example Scripts ### CPU Load Average (Shell) ```bash #!/bin/sh # Read 1-minute, 5-minute, and 15-minute load averages load=$(cat /proc/loadavg | awk '{print $1, $2, $3}') set -- $load echo "system_load,host=$(hostname) load1=$1,load5=$2,load15=$3" ``` ### Process Count (Shell) ```bash #!/bin/sh # Count running processes proc_count=$(ps aux | wc -l) echo "process_count,host=$(hostname) count=$((proc_count-1))i" ``` ### Custom Application Metric (Python) ```python #!/usr/bin/env python3 import subprocess import time # Example: measure disk I/O result = subprocess.run(['iostat', '-d', '1', '2'], capture_output=True, text=True) lines = result.stdout.strip().split('\n') last_line = lines[-1].split() # Extract I/O reads/writes per second reads_per_sec = float(last_line[1]) writes_per_sec = float(last_line[2]) print(f"disk_io,device=sda read_ops={reads_per_sec},write_ops={writes_per_sec}") ``` ### Temperature Sensor (Python) ```python #!/usr/bin/env python3 # Read CPU temperature from psutil library import psutil temps = psutil.sensors_temperatures() if 'coretemp' in temps: core_temp = temps['coretemp'][0].current print(f"cpu_temperature,sensor=coretemp temperature={core_temp}") else: print("cpu_temperature,sensor=fallback temperature=0") ``` ## Troubleshooting | Symptom | Cause | Solution | |---------|-------|----------| | Metric never appears on `:9273/metrics` | Script not executable, or stdout is not valid Influx Line Protocol | Run `docker exec metrics-manager /app/custom-metrics/your-script.sh` and inspect output. Add `set -x` to shell scripts for debug. | | Telegraf log contains `metric parse error` | Script printed a non-Influx line (banner, debug output, etc.) | Ensure ONLY metric lines are printed to stdout. Redirect debug output to `/dev/null` or stderr. | | Script appears to run only once | Misunderstanding of interval timing | The `[[inputs.exec]]` `interval = "10s"` runs every 10 seconds. Check logs: `docker logs metrics-manager \| grep telegraf` | | Permission denied | Script lacks execute permission | Run `docker exec metrics-manager chmod +x /app/custom-metrics/your-script.sh` | | Script times out after 5 seconds | Script takes too long | Optimize your script to finish faster, or increase the `timeout = "5s"` in `telegraf.conf` | ### Manual Testing Test your script manually inside the container: ```bash # List scripts in the custom-metrics directory docker exec metrics-manager ls -la /app/custom-metrics/ # Run a script manually docker exec metrics-manager /app/custom-metrics/fan_rpm.sh # Check Telegraf logs for errors docker logs metrics-manager | grep -i "metric\|exec\|telegraf" ``` --- ## When NOT to Use `/app/custom-metrics` Use the REST API instead when: - **The metric originates inside an existing application** — push from your code with `POST /api/v1/metrics/simple` - **You need sub-second granularity** — the `inputs.exec` interval is 10 seconds - **The metric source already speaks OTLP or Influx Line Protocol over HTTP** — use `POST /api/v1/metrics/otlp` or `POST /api/v1/metrics/influx` See [API Reference](../api-reference.md) for REST API options. ## Advanced: Extend supervisord with Custom Collectors If you need a persistent background process (not just periodic scripts), add it to supervisord: ```ini [program:my-collector] command=/usr/local/bin/my-collector autostart=true autorestart=true stdout_logfile=/dev/stdout stdout_logfile_maxbytes=0 stderr_logfile=/dev/stderr stderr_logfile_maxbytes=0 priority=40 ``` See [Environment Variables](./environment-variables.md) for details on extending supervisord. ## Supporting Resources - [Environment Variables](./environment-variables.md) - [API Reference](../api-reference.md) - [Get Started Guide](../get-started.md) - [Troubleshooting](../troubleshooting.md) ## License Copyright (C) 2025-2026 Intel Corporation SPDX-License-Identifier: Apache-2.0