Edge Node Platform Observability Agent#

Background#

This document provides high-level design and implementation guidelines. Refer to Platform Observability Agent in Edge Node Agents GitHub repository for implementation details.

Target Audience#

The target audience for this document is:

  • Developers interested in contributing to the implementation of the Platform Observability Agent.

  • Administrators and System Architects interested in the architecture, design and functionality of the Platform Observability Agent.

Overview#

Platform Observability Agent is part of the Open Edge Platform’s Edge Node Zero Touch Provisioning. It is installed, configured and automatically executed at Provisioning time.

The Platform Observability Agent (POA) is a set of four observability agents deployed as individual systemd services alongside the other Edge Node Agents on the Edge Node. These services are:

  • platform-observability-logging, which is a FluentBit* service for scraping logs from all Edge Node Agents except health check logs.

  • platform-observability-health-check, which is a FluentBit service for scraping health check logs from Edge Node Agents.

  • platform-observability-metrics, which is a Telegraf* service for scraping metrics from the Edge Node Agents as well as from the Edge Node HW.

  • platform-observability-collector, which is a OpenTelemetry* Collector service for gathering and forwarding all logs and hardware metrics to the Edge Orchestrator.

Architecture Diagram#

The Platform Observability Agent follows the architecture and design principles set out in High-Level Architecture

High-Level Architecture of the Platform Observability Agent

Figure 1: High-Level Architecture of Platform Observability Agent#

Key Components#

  1. The Platform Observability Agent is a system daemon packaged as a .deb or .rpm package (depending on target Operating System).

  2. The Platform Observability Agent requires a designated JWT token

  3. FluentBit service with config at /etc/fluent-bit/fluent-bit.conf

  4. Health check service with config at /etc/health-check/health-check.conf

  5. Telegraf service with config at /etc/telegraf/telegraf.d/telegraf.conf

  6. OpenTelemetry service with config at /etc/otelcol/otelcol.yaml

Data Flow#

The data flow of the Platform Observability Agent can be broken down into multiple concepts called out in Workflow Stages section.

Workflow Stages#

  1. Log Collection configuration:

            flowchart TD
       I1[KE service] -->|logs| Collector
       I2[Hardware Discovery Agent] -->|logs| Collector
       I3[Cluster Agent] -->|logs| Collector
       I4[Node Agent] -->|logs| Collector
       I5[Vault Agent] -->|logs| Collector
       I6[Platform Update Agent] -->|logs| Collector
       I7[INBC] -->|logs| Collector
       I8[RKE System Agent] -->|logs| Collector
       I9[RKE Server] -->|logs| Collector
       I10[Telegraf] -->|logs| Collector
       I11[Otel Collector] -->|logs| Collector
       I12[Telemetry Agent] -->|logs| Collector
       I13[AppArmour] -->|logs| Collector
       I14[Process] -->|logs| Collector
       I15[EN Users] -->|logs| Collector
       I16[Firewall] -->|logs| Collector
       I17[Host] -->|logs| Collector
       I18[OS] -->|logs| Collector
       Collector --> Routing
       Routing --> Orchestrator
        

Figure 2: Log Collection configuration

  1. Metrics Collection configuration:

            flowchart TD
       I1[Telegraf] -->|metrics| Collector
       I2[Node Agent] -->|metrics| Collector
       I3[Cluster Agent] -->|metrics| Collector
       I4[Hardware Agent] -->|metrics| Collector
       I5[Platform Update Agent] -->|metrics| Collector
       Collector --> Routing
       Routing --> Orchestrator
        

Figure 3: Metrics Collection configuration

Extensibility#

The Platform Observability Agent functionality can be extended by making source code changes.

Deployment#

The Platform Observability Agent is deployed as a set of system daemons via installation of a .deb package during the provisioning or .rpm package as part of the Edge Microvisor Toolkit.

The POA installs four services, platform-observability-logging, platform-observability-health-check, platform-observability-metrics and platform-observability-collector, when deployed on to the Edge Node.

Each service file is stored in the /lib/systemd/system/ folder as <service_name>.service.

The config file for the platform-observability-logging service is stored in /etc/fluent-bit/fluent-bit.conf.

The config file for the platform-observability-health-check service is stored in /etc/health-check/health-check.conf.

The config file for the platform-observability-metrics service is stored in /etc/telegraf/telegraf.conf.

The config file for the platform-observability-collector service is stored in /etc/otelcol/otelcol.yaml. Logs for each service can be viewed using the journalctl tool.

Technology Stack#

Below sections provide an overview of various aspects of the Platform Observability Agent’s technology stack.

Implementation#

The Platform Observability Agent is implemented as a set of observability services configured for collection of desired logs and metrics.

System Diagram#

Platform Observability Agent depends on Edge Orchestrator endpoints:

  • Edge Orchestrator central log collector service endpoint.

  • Edge Orchestrator central metrics collector service endpoint.

Platform Observability Agent external telemetry collectors:

Platform Observability Agent system diagram

Figure 4: Platform Observability Agent system diagram#

Integrations#

Platform Observability Agent does not expose an API, it exposes metrics to the endpoints of the Edge Orchestrator.

Platform Observability Agent integrates the 3rd party metric collectors - FluentBit, Telegraf, OpenTelemetry collector.

Security#

Security Policies#

Platform Observability Agent adheres to Edge Node Agents High-Level Architecture security design principle.

Auditing#

Platform Observability Agent adheres to Edge Node Agents High-Level Architecture observability design principle.

Upgrades#

Platform Observability Agent adheres to Edge Node Agents High-Level Architecture upgrade design principle.