Edge Node Platform Observability Agent#

Background#

This document provides high-level design and implementation guidelines. Refer to Platform Observability Agent in the Edge Node Agents GitHub* repository for implementation details.

Target Audience#

The target audience for this document includes:

  • Developers interested in contributing to the implementation of the Platform Observability Agent.

  • Administrators and System Architects interested in the architecture, design and functionality of the Platform Observability Agent.

Overview#

Platform Observability Agent is part of the Open Edge Platform’s Edge Node Zero Touch Provisioning. It is installed, configured and automatically executed at Provisioning time.

The Platform Observability Agent is a set of four observability agents deployed as individual systemd services alongside the other Edge Node Agents on the Edge Node. These services are:

  • platform-observability-logging, which is a FluentBit* service for scraping logs from all Edge Node Agents except health check logs.

  • platform-observability-health-check, which is a FluentBit service for scraping health check logs from Edge Node Agents.

  • platform-observability-metrics, which is a Telegraf* service for scraping metrics from the Edge Node Agents as well as from the Edge Node HW.

  • platform-observability-collector, which is a OpenTelemetry* Collector service for gathering and forwarding all logs and hardware metrics to the Edge Orchestrator.

Architecture Diagram#

The Platform Observability Agent follows the architecture and design principles set out in High-Level Architecture

High-Level Architecture of the Platform Observability Agent

Figure 1: High-Level Architecture of Platform Observability Agent#

Key Components#

  1. The Platform Observability Agent is a system daemon packaged as a .deb or .rpm package (depending on target Operating System).

  2. The Platform Observability Agent requires a designated JWT token

  3. FluentBit service with config at /etc/fluent-bit/fluent-bit.conf

  4. Health check service with config at /etc/health-check/health-check.conf

  5. Telegraf service with config at /etc/telegraf/telegraf.d/telegraf.conf

  6. OpenTelemetry service with config at /etc/otelcol/otelcol.yaml

Data Flow#

The data flow of the Platform Observability Agent can be broken down into multiple concepts called out in the Workflow Stages section.

Workflow Stages#

  1. Log Collection configuration:

            flowchart TD
       I1[KE service] -->|logs| Collector
       I2[Hardware Discovery Agent] -->|logs| Collector
       I3[Cluster Agent] -->|logs| Collector
       I4[Node Agent] -->|logs| Collector
       I5[Vault Agent] -->|logs| Collector
       I6[Platform Update Agent] -->|logs| Collector
       I7[INBC] -->|logs| Collector
       I8[RKE System Agent] -->|logs| Collector
       I9[RKE Server] -->|logs| Collector
       I10[Telegraf] -->|logs| Collector
       I11[Otel Collector] -->|logs| Collector
       I12[Telemetry Agent] -->|logs| Collector
       I13[AppArmour] -->|logs| Collector
       I14[Process] -->|logs| Collector
       I15[EN Users] -->|logs| Collector
       I16[Firewall] -->|logs| Collector
       I17[Host] -->|logs| Collector
       I18[OS] -->|logs| Collector
       Collector --> Routing
       Routing --> Orchestrator
        

Figure 2: Log Collection configuration

  1. Metrics Collection configuration:

            flowchart TD
       I1[Telegraf] -->|metrics| Collector
       I2[Node Agent] -->|metrics| Collector
       I3[Cluster Agent] -->|metrics| Collector
       I4[Hardware Agent] -->|metrics| Collector
       I5[Platform Update Agent] -->|metrics| Collector
       Collector --> Routing
       Routing --> Orchestrator
        

Figure 3: Metrics Collection configuration

Extensibility#

The Platform Observability Agent functionality can be extended by making source code changes.

Deployment#

The Platform Observability Agent is deployed as a set of system daemons via installation of a .deb package during the provisioning or .rpm package as part of the Edge Microvisor Toolkit.

The Platform Observability Agent installs four services, platform-observability-logging, platform-observability-health-check, platform-observability-metrics and platform-observability-collector, when deployed on to the Edge Node.

Each service file is stored in the /lib/systemd/system/ folder as <service_name>.service.

The config file for the platform-observability-logging service is stored in /etc/fluent-bit/fluent-bit.conf.

The config file for the platform-observability-health-check service is stored in /etc/health-check/health-check.conf.

The config file for the platform-observability-metrics service is stored in /etc/telegraf/telegraf.conf.

The config file for the platform-observability-collector service is stored in /etc/otelcol/otelcol.yaml. Logs for each service can be viewed using the journalctl tool.

Technology Stack#

Below sections provide an overview of various aspects of the Platform Observability Agent’s technology stack.

Implementation#

The Platform Observability Agent is implemented as a set of observability services configured for collection of desired logs and metrics.

System Diagram#

Platform Observability Agent depends on Edge Orchestrator endpoints:

  • Edge Orchestrator central log collector service endpoint.

  • Edge Orchestrator central metrics collector service endpoint.

Platform Observability Agent external telemetry collectors:

Platform Observability Agent system diagram

Figure 4: Platform Observability Agent system diagram#

Integrations#

Platform Observability Agent does not expose an API, it exposes metrics to the endpoints of the Edge Orchestrator.

Platform Observability Agent integrates the 3rd party metric collectors - FluentBit, Telegraf, OpenTelemetry collector.

Security#

Security Policies#

Platform Observability Agent adheres to Edge Node Agents High-Level Architecture security design principle.

Auditing#

Platform Observability Agent adheres to Edge Node Agents High-Level Architecture observability design principle.

Upgrades#

Platform Observability Agent adheres to Edge Node Agents High-Level Architecture upgrade design principle.