Alerting Monitor
================

The Alerting functionality:

* Introduces additional components (``Alerting Monitor System`` consisting of
  ``Alerting Monitor`` and ``Prometheus Alertmanager``).

* Adds extensions to existing components (``Grafana Mimir Ruler`` in ``Edge
  Node Observability Stack``).

* Integrates with other ``Edge Orchestrator Services`` (``Identity and Access
  Management System``, ``Infrastructure Management`` or ``Graphical User
  Interface``).

It depends on established flow of telemetry data from ``Edge Nodes`` to the
``Edge Node Observability Stack`` and the presence of external ``Email Server``
for delivering emails.

.. image:: ./images/diagram_ui_alerting.svg
   :alt: System Context diagram for UI-configurable Alerting
   :align: center
   :width: 100%

Overview
--------

To expose configurable Alerting capability, ``Alerting Monitor`` service
introduces a REST API that:

- is backed by its own configuration database
- allows reconfiguration of stored alert definitions
- relays alert and storage to dependent component (``Prometheus Alertmanager``)
- aggregates, transforms, and filters alert data obtained via proxying requests
  to ``Prometheus Alertmanager``

The ``Alerting Monitor`` service's responsibility is also maintenance of
coherent alerting configuration of dependent components via a set of internal
controllers referred to as ``External Services Controller``:

- ``Grafana Mimir Ruler``: part of ``Edge Node Observability Stack`` that is
  the main generator of rules-based alerts from ``Edge Nodes`` telemetry.

- ``Prometheus Alertmanager``: standalone component of ``Alerting Monitor
  System`` that is responsible for grouping, routing, and sending alerts
  through email channel.

Deployment
----------

``Alerting Monitor`` is deployed in HA mode and supports operation of multiple
replicas, but with just only one instance at a time actively controlling
(modifying) the configuration of dependent components. Coordination of tasks
between replicas is done via service's transactional database. A
reconfiguration task executed by a replica implements a timeout, so if this
replica fails, the task can be picked up by another replica. Alerting Monitor
uses Horizontal Pod Autoscaler (HPA) to dynamically scale the number of its
replicas.

The service exposes both ``liveness`` and ``readiness`` Kubernetes probes. The
service is considered alive and ready to accept API requests when:

- Service startup from supplied configuration is completed.
- Database initialization/migration procedures are complete.
- Connectivity to dependent services is verified.

The ``Prometheus Alertmanager`` dependent component is deployed in cluster mode
(all replicas are receiving alerts from ``Grafana Mimir Ruler``) and is
responsible for deduplication and grouping of all received alerts.

Internally, the service is composed of the following sub-components:

* ``alerting-monitor`` core service that exposes REST API for alert
  configuration management and reconfiguration of dependent services.

* ``open-policy-agent`` (OPA) policy engine for API access control.

* ``management`` exposes gRPC API to reload configuration for handling
  Multitenancy.

Refer to :doc:`../concepts/multitenancy` section for more details on how
multitenancy in observability is handled.

Configuration
-------------

The service is supplied with a ``YAML`` based configuration containing default
alert definitions that conform to `Prometheus alerting rules schema
<https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/>`_.
This configuration is additionally templated via Helm Chart to support
different evaluation and aggregation intervals.

This default alerts configuration is used to populate the database with initial
definitions to be applied and can be used to restore default actions.

The service is supplied with a ``YAML`` based configuration containing
configuration required for alert delivery channel. The only supported
``Receiver`` type is ``email`` and it needs to conform to `Prometheus
Alertmanager email_config schema
<https://prometheus.io/docs/alerting/latest/configuration/#email_config>`_.

This default alert delivery channel configuration is applied to all alerts
managed by the service.

The service supports graceful shutdown since configuration reloading may
require restart.

Refer to :doc:`../../tutorials/development/email-notifications` section for
detailed information on configuring the Alerting Monitor Email Notifications.

REST API (Northbound)
---------------------

The exposed Northbound API is proxied through Multi Tenancy API Gateway. Refer
to **API Guide** for detailed Alerts API reference.

REST API requests originating are handled:

- *Asynchronously* if they translate to requests that require changing the
  configuration of other services.

- *Synchronously* if they translate to only read requests on dependent
  services.

Asynchronous API call example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. image:: ./images/asynchronous_api_call_example_diagram.svg
   :alt: Asynchronous API call handling example
   :align: center
   :width: 100%

Synchronous API Call Example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. image:: ./images/synchronous_api_call_example_diagram.svg
   :alt: Asynchronous API call handling example
   :align: center
   :width: 100%

User Interface (UI)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The User Interface (UI) provides support for configuring alerts under the Settings tab (gear icon in the top right corner of the UI).
The UI allows users to configure thresholds and durations for alerts, as well as the email addresses to which alerts should be sent.

.. image:: ./images/alerts-configuration.png
   :alt: Asynchronous API call handling example
   :align: center
   :width: 100%

It also provides a view of the alerts that have been reported by the system.

.. image:: ./images/reported-alerts.png
   :alt: Asynchronous API call handling example
   :align: center
   :width: 100%


.. toctree::
   :maxdepth: 3