Get Started#
The Audio Analyzer microservice enables developers to create speech transcription from video files. This section provides step-by-step instructions to:
Set up the microservice using a pre-built Docker image for quick deployment.
Run predefined tasks to explore its functionality.
Learn how to modify basic configurations to suit specific requirements.
Prerequisites#
Before you begin, ensure the following:
System Requirements: Verify that your system meets the minimum requirements.
Docker Installed: Install Docker. For installation instructions, see Get Docker.
This guide assumes basic familiarity with Docker commands and terminal usage. If you are new to Docker, see Docker Documentation for an introduction.
Environment Variables#
The following environment variables can be configured:
DEBUG
: Enable debug mode (default: False)UPLOAD_DIR
: Directory for uploaded files (default: /tmp/audio-analyzer/uploads)OUTPUT_DIR
: Directory for transcription output (default: /tmp/audio-analyzer/transcripts)ENABLED_WHISPER_MODELS
: Comma-separated list of Whisper models to enable and downloadDEFAULT_WHISPER_MODEL
: Default Whisper model to use (default: tiny.en or first available model)GGML_MODEL_DIR
: Directory for downloading GGML models (for CPU inference)OPENVINO_MODEL_DIR
: Directory for storing OpenVINO optimized models (for GPU inference)LANGUAGE
: Language code for transcription (default: None, auto-detect)MAX_FILE_SIZE
: Maximum allowed file size in bytes (default: 100MB)DEFAULT_DEVICE
: Device to use for transcription - ‘cpu’, ‘gpu’, or ‘auto’ (default: cpu)USE_FP16
: Use half-precision (FP16) for GPU inference (default: True)
MinIO Configuration
STORAGE_BACKEND
: Storage backend to use - ‘minio’ or ‘filesystem’ (default: minio)MINIO_ENDPOINT
: MinIO server endpoint (default: minio:9000 in Docker, localhost:9000 on host)MINIO_ACCESS_KEY
: MinIO access key used as login username (default for docker setup: minioadmin)MINIO_SECRET_KEY
: MinIO secret key used as login password (default for docker setup: minioadmin)
Setup the Storage backends#
The service supports two storage backends for source video files and transcript output:
MinIO (default): Store transcripts in a MinIO bucket
Filesystem: Store transcripts on the local filesystem. The API service runs standalone and will not have any dependency.
You can configure the storage backend using the STORAGE_BACKEND
environment variable:
For Minio Storage (Default):
export STORAGE_BACKEND=minio
For Local filesystem storage:
export STORAGE_BACKEND=local
MinIO integration#
The service now supports MinIO object storage integration for:
Video Source: Fetch videos from a MinIO bucket instead of direct uploads
Transcript Storage: Store transcription outputs (SRT/TXT) in a MinIO bucket
MinIO Configuration#
To use MinIO integration, you need to configure the following environment variables:
# MinIO server connection
export MINIO_ACCESS_KEY=<your-minio-username>
export MINIO_SECRET_KEY=<your-minio-password>
Models Selection#
Refer to supported models for the list of models that can be used for transcription. You can specify which models to enable through the ENABLED_WHISPER_MODELS
environment variable.
Quick Start with Docker#
The user has an option to either build the docker images or use prebuilt images as documented below.
To be documented
API Usage#
Below are examples of how to use the API with curl for both filesystem and MinIO storage setups.
Health Check#
curl -X GET "http://localhost:8000/api/v1/health"
Get Available Models#
curl -X GET "http://localhost:8000/api/v1/models"
Filesystem Storage Examples#
Upload a Video File for Transcription (Filesystem)#
curl -X POST "http://localhost:8000/api/v1/transcriptions" \
-H "Content-Type: multipart/form-data" \
-F "file=@/path/to/your/video.mp4" \
-F "include_timestamps=true" \
-F "device=cpu" \
-F "model_name=small.en"
MinIO Storage Examples#
Before using MinIO storage, make sure:
Your MinIO server is running
You have configured proper credentials
You have created the necessary buckets
curl -X POST "http://localhost:8000/api/v1/transcriptions" \
-H "Content-Type: application/json" \
-d '{
"minio_bucket": "videos",
"video_name": "example.mp4",
"video_id": "project1/raw",
"include_timestamps": true,
"device": "cpu",
"model_name": "medium.en"
}'
This API endpoint returns a job ID, transcription path and other details once the transcription is done.
Transcription Performance and Optimization on CPU#
The service uses pywhispercpp with the following optimizations for CPU transcription:
Multithreading: Automatically uses the optimal number of threads based on your CPU cores
Parallel Processing: Utilizes multiple CPU cores for audio processing
Greedy Decoding: Faster inference by using greedy decoding instead of beam search
OpenVINO IR Models: Can download and use OpenVINO IR models for even faster CPU inference
Running Tests#
The project uses pytest
for testing. After installing and setting up the application on host, we can run tests as follows:
# Run all tests
poetry run pytest
# Run tests with verbose output
poetry run pytest -v
# Run tests by type (unit or api)
poetry run pytest -m unit
poetry run pytest -m api
# Run tests for a specific module (eg. utils/hardware_utils.py)
poetry run pytest tests/test_utils/test_hardware_utils.py
Generate Test Coverage Reports#
To generate a coverage report:
# Run tests with coverage
poetry run pytest --cov=audio_analyzer
# Generate detailed HTML coverage report
poetry run pytest --cov=audio_analyzer --cov-report=html
# Open the HTML report
xdg-open htmlcov/index.html
Make sure xdg-open
is installed on the host machine. The coverage report helps identify which parts of the codebase are well tested and which may need additional test coverage.
API Documentation#
When running the service, you can access the Swagger UI documentation at:
http://localhost:8000/docs
Manual Host Setup using Poetry#
Clone the repository and change directory to the audio-analyzer microservice:
git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries -b release-1.2.0
cd edge-ai-libraries/microservices/audio-analyzer
Install Poetry if not already installed.
pip install poetry==1.8.3
Configure poetry to create a local virtual environment.
poetry config virtualenvs.create true
poetry config virtualenvs.in-project true
Install dependencies:
poetry lock --no-update
poetry install
Set comma-separated list of whisper models that need to be enabled:
export ENABLED_WHISPER_MODELS=small.en,tiny.en,medium.en
Set directories on host where models will be downloaded:
export GGML_MODEL_DIR=/tmp/audio_analyzer_model/ggml
export OPENVINO_MODEL_DIR=/tmp/audio_analyzer_model/openvino
Run the service:
DEBUG=True poetry run uvicorn audio_analyzer.main:app --host 0.0.0.0 --port 8000 --reload
(Optional): To run the service with Minio storage backend. Please make sure Minio Server is running on
localhost:9000
. Please see Running a Local Minio Server.
STORAGE_BACKEND=minio DEBUG=True poetry run uvicorn audio_analyzer.main:app --host 0.0.0.0 --port 8000 --reload
Advanced Setup Options#
Running a Local MinIO Server#
If you’re not using Docker Compose, you can run a local MinIO server using:
docker run -d -p 9000:9000 -p 9001:9001 --name minio \
-e MINIO_ROOT_USER=${MINIO_ACCESS_KEY} \
-e MINIO_ROOT_PASSWORD=${MINIO_SECRET_KEY} \
-v minio_data:/data \
minio/minio server /data --console-address ':9001'
You can then access the MinIO Console at http://localhost:9001 with these credentials:
Username: <MINIO_ACCESS_KEY>
Password: <MINIO_SECRET_KEY>
When to use Filesystem vs. MinIO backend#
Use Filesystem backend when:
Running in a simple, single-node deployment
No need for distributed/scalable storage
No integration with other services that might need to access transcripts
Running in resource-constrained environments
Use MinIO backend (default) when:
Running in a containerized/cloud environment
Need for scalable, distributed object storage
Integration with other services that need to access transcripts
Building a clustered/distributed system
Need for better data organization and retention policies
Next Steps#
Troubleshooting#
Docker Container Fails to Start:
Run
docker logs {{container-name}}
to identify the issue.Check if the required port is available.
Cannot Access the Microservice:
Confirm the container is running:
docker ps
Supporting Resources#
Overview