How It Works#

This section provides a high-level view of how the application processes audio input and integrates with a modular backend architecture.

High-Level System Diagram

Inputs#

Audio Files You can upload audio recordings through the Web-based UI layer, which supports:

The uploaded audio is passed to the Backend API, which acts as the gateway to the backend service layer and provides similar capabilities.

Processing:

Audio Pre-processing Cleans and formats audio data for downstream tasks.
ASR Component (Automatic Speech Recognition) Converts audio into text using integrated ASR providers:
- FunASR
- OpenVINO
- OpenAI
Summariser Component Generates concise summaries of transcribed text using LLM providers:
- iPexLLM
- OpenVINO
Metrics Collector Monitors and collects:
- xPU utilisation for hardware performance
- LLM metrics for summarisation efficiency
Pipeline Service

The Pipeline Service manages multiple DL Streamer-based pipelines:

A Media Server (MediaMTX) supports streaming and distribution of processed video feeds.

Transcriptions and summaries can be accessed from the Web-based UI and file system. The path for file system is /<project-location>/<your-project-name>/. For example, /storage/chapter-10/
Performance metrics (e.g., utilisation, model efficiency) are displayed for monitoring.
Localisation ensures outputs are available in multiple languages (English/Chinese).

System Requirements: Check the hardware and software requirements for deploying the application.
Get Started: Follow step-by-step instructions to set up the application.
Application Flow: Check the flow of application.