How It Works#
This section provides a high-level view of how the application processes audio input and integrates with a modular backend architecture.
Inputs#
Audio Files You can upload audio recordings through the Web-based UI layer, which supports:
Audio upload
Viewing transcription, summaries, and performance metrics
Localisation options (English/Chinese)
The uploaded audio is passed to the Backend API, which acts as the gateway to the backend service layer and provides similar capabilities.
Processing
Audio Pre-processing Cleans and formats audio data for downstream tasks.
ASR Component (Automatic Speech Recognition) Converts audio into text using integrated ASR providers:
FunASR
OpenVINO
OpenAI
Summariser Component Generates concise summaries of transcribed text using LLM providers:
iPexLLM
OpenVINO
Metrics Collector Monitors and collects:
xPU utilisation for hardware performance
LLM metrics for summarisation efficiency
Outputs#
Transcriptions and summaries can be accessed from the Web-based UI and file system. The path for file system is /
/ . For example, /storage/chapter-10// Performance metrics (e.g., utilisation, model efficiency) are displayed for monitoring.
Localisation ensures outputs are available in multiple languages (English/Chinese).
Learn More#
System Requirements: Check the hardware and software requirements for deploying the application.
Get Started: Follow step-by-step instructions to set up the application.