# How It Works
This page describes the architecture and internal flow of an audio request
through the microservice.
## Architecture
At a high level, the Audio Analyzer is a FastAPI service that accepts an
audio upload, splits it into chunks with FFmpeg, runs each chunk through an
ASR backend, and (optionally) runs a sentiment model in parallel. Results
are aggregated per session and returned either as a single JSON response or
as an NDJSON event stream.
```mermaid
%%{init: {
'theme': 'base',
'themeVariables': {
'fontFamily': '"IntelOne Display", "Intel Clear", "Inter", "Segoe UI", Arial, sans-serif',
'fontSize': '14px',
'primaryColor': '#0068B5',
'primaryTextColor': '#FFFFFF',
'primaryBorderColor': '#00377C',
'lineColor': '#00377C',
'secondaryColor': '#EEF3F8',
'tertiaryColor': '#F7F8FA',
'background': '#FFFFFF',
'mainBkg': '#FFFFFF',
'clusterBkg': '#F7F8FA',
'clusterBorder': '#0068B5',
'edgeLabelBackground': '#FFFFFF',
'noteBkgColor': '#F7F8FA',
'noteTextColor': '#3A3A3A'
}
}}%%
flowchart LR
Client([Client])
subgraph Service["Audio Analyzer (FastAPI, :8010)"]
API["API Layer
(transcription / health / devices)"]
Pipeline["Pipeline Orchestrator
(pipeline.py)"]
Pre["Preprocessing
(FFmpeg: decode, chunk, denoise)"]
ASR["ASR Backend
(openai | openvino | whispercpp)"]
Sent["Sentiment Backend
(openvino | pytorch)"]
Session[("Session Store
storage/<session_id>/")]
end
Models[("Model Cache
models/")]
Device{{"Inference Device
CPU / GPU"}}
Client -- "POST /v1/audio/transcriptions{,/stream}" --> API
API --> Pipeline
Pipeline --> Pre
Pre --> ASR
Pre --> Sent
ASR --> Device
Sent --> Device
ASR --> Pipeline
Sent --> Pipeline
Pipeline <--> Session
ASR -. loads .-> Models
Sent -. loads .-> Models
Pipeline -- "JSON response / NDJSON events
X-Session-ID header" --> Client
classDef client fill:#FFFFFF,stroke:#0068B5,stroke-width:2px,color:#3A3A3A;
classDef core fill:#0068B5,stroke:#00377C,stroke-width:1.5px,color:#FFFFFF;
classDef backend fill:#00A3F4,stroke:#00377C,stroke-width:1.5px,color:#FFFFFF;
classDef store fill:#6C6C6C,stroke:#0068B5,stroke-width:1.5px,color:#FFFFFF;
classDef device fill:#00C7FD,stroke:#00377C,stroke-width:1.5px,color:#3A3A3A;
class Client client;
class API,Pipeline,Pre core;
class ASR,Sent backend;
class Session,Models store;
class Device device;
style Service fill:#F7F8FA,stroke:#0068B5,stroke-width:1.5px,color:#3A3A3A;
```
**Key planes:**
- **API layer** — request validation, session header handling, response
shaping (single JSON vs. streaming NDJSON).
- **Pipeline orchestrator** — drives preprocessing, ASR, and sentiment;
aggregates per-chunk results into a session-level summary.
- **Backends** — pluggable ASR and sentiment implementations selected via
config; each backend handles its own model loading and device placement.
- **Session store** — per-session directory holding chunk files and
metadata; enables multi-upload continuation via `session_id`.
## Request Flow
1. **Upload** — A client sends an audio file to either
`POST /v1/audio/transcriptions` (single response) or
`POST /v1/audio/transcriptions/stream` (NDJSON event stream).
2. **Session resolution** — If `session_id` is supplied, the service reuses
the existing session directory under `storage//`. Otherwise, it
creates a new session and returns the id in the `X-Session-ID` response
header.
3. **Preprocessing** — FFmpeg decodes the upload and produces audio chunks
under the configured `audio_preprocessing.chunk_dir`. Chunk size, silence
detection, and optional denoising are controlled by the
`audio_preprocessing` config section.
4. **ASR inference** — Each chunk is transcribed by the configured ASR
backend (`openai` or `openvino`) on the configured device (typically
`CPU`, optionally `GPU` for supported OpenVINO paths).
5. **Sentiment (optional)** — When `sentiment.enabled` is true, the
service runs the configured sentiment model (`openvino` or `pytorch`) and
aggregates a session-level summary.
6. **Response** — The non-streaming endpoint returns a final response object;
the streaming endpoint emits `transcription.chunk` events as each chunk
completes and a final `transcription.completed` event.
7. **Cleanup** — If `pipeline.delete_chunks_after_use` is true, temporary
chunk files are removed after processing. Session metadata remains under
`storage//`.
## Components
- `api/` — FastAPI routers for transcription, health, and device listing.
- `pipeline.py` — Orchestrates preprocessing, ASR, and sentiment.
- `components/` — Backend implementations for ASR and sentiment providers.
- `utils/` — Audio utilities, config loading, and session helpers.
- `dto/` — Request and response data models.
## Configuration Surface
All runtime behavior is driven by `config.yaml`, shared by both standalone
and container runs, with targeted overrides via `AUDIO_ANALYZER__...`
environment variables. See the [Configuration Guide](./get-started/configuration.md) for the
full list of fields.