Get Started#

The Audio Analyzer microservice enables developers to create speech transcription from video files. This section provides step-by-step instructions to:

Set up the microservice using a pre-built Docker image for quick deployment.
Run predefined tasks to explore its functionality.
Learn how to modify basic configurations to suit specific requirements.

Prerequisites#

Before you begin, ensure the following:

System Requirements: Verify that your system meets the minimum requirements.
Docker Installed: Install Docker. Make sure the docker command can be run without sudo. For installation instructions, see Get Docker.

This guide assumes basic familiarity with Docker commands and terminal usage. If you are new to Docker, see Docker Documentation for an introduction.

Configurations#

Environment Variables#

The following environment variables can be configured:

UPLOAD_DIR: Directory for uploaded files (default: /tmp/audio-analyzer/uploads)
OUTPUT_DIR: Directory for transcription output (default: /tmp/audio-analyzer/transcripts)
ENABLED_WHISPER_MODELS: Comma-separated list of Whisper models to enable and download
DEFAULT_WHISPER_MODEL: Default Whisper model to use if a model name is not provided explicitly (default: tiny.en or first model from ENABLED_WHISPER_MODELS list, if tiny.en is not available)
GGML_MODEL_DIR: Directory for downloading GGML models (for CPU inference)
OPENVINO_MODEL_DIR: Directory for storing OpenVINO optimized models (for GPU inference)
LANGUAGE: Language code for transcription (default: None, auto-detect)
MAX_FILE_SIZE: Maximum allowed file size in bytes (default: 100MB)
DEFAULT_DEVICE: Device to use for transcription - ‘cpu’, ‘gpu’, or ‘auto’ (default: cpu)
USE_FP16: Use half-precision (FP16) for GPU inference (default: True)
STORAGE_BACKEND: Storage backend to use - ‘minio’ or ‘filesystem’.

MinIO Configuration

MINIO_ENDPOINT: MinIO server endpoint (default: minio:9000 in Docker setup script)
MINIO_ACCESS_KEY: MinIO access key used as login username
MINIO_SECRET_KEY: MinIO secret key used as login password

Setup the Storage backends#

The service supports two storage backends for source video files and transcript output:

MinIO : Store transcripts in a MinIO bucket. (Default value when Docker setup script is used)
Filesystem: Store transcripts on the local filesystem. The API service will not have any external storage dependency. (Default value when application runs in standalone mode.)

The Docker setup script setup_docker.sh has minio as default storage backend. You can override the default value by setting STORAGE_BACKEND environment variable:

For Minio Storage:

export STORAGE_BACKEND=minio

For Local filesystem storage:

export STORAGE_BACKEND=local

On the other hand, the host setup script setup_host.sh uses local filesystem as the only storage backend available.

MinIO integration#

The service supports MinIO object storage integration for:

Video Source: Fetch videos from a MinIO bucket instead of direct uploads
Transcript Storage: Store transcription outputs (SRT/TXT) in a MinIO bucket

MinIO Configuration#

To use MinIO integration, you need to configure the following environment variables:

# MinIO server connection
export MINIO_ACCESS_KEY=<your-minio-username>
export MINIO_SECRET_KEY=<your-minio-password>

Models Selection#

Refer to supported models for the list of models that can be used for transcription. You can specify which models to enable through the ENABLED_WHISPER_MODELS environment variable.

Quick Start#

User has following different options to start and use the application :

Build the image and run using Docker script. Docker script helps build images for application and any required dependency and deploy the application. Default storage backend used here is minio but can be updated to use local storage backend.
Use pre-built image for standalone setup. Standalone setup has no external dependency. Default and recommended storage backend: local.
Build and setup on host using setup script. Only storage backend available: local
Build and setup on host manually. Default storage backend used is local but can be configured to use minio storage backend.

Standalone Setup in Docker Container#

Set the registry and tag for the public image to be pulled.
```
export REGISTRY=intel/
export TAG=1.3.1
```

Pull public image for Audio Analyzer Microservice:

docker pull ${REGISTRY}audio-analyzer:${TAG:-latest}

Set the required environment variables:

export ENABLED_WHISPER_MODELS=small.en,tiny.en,medium.en

Set and create the directory in filesystem where transcripts will be stored:

export AUDIO_ANALYZER_DIR=~/audio_analyzer_data
mkdir $AUDIO_ANALYZER_DIR

Stop any existing Audio-Analyzer container (if any):
```
docker stop audioanalyzer
```

Run the Audio-Analyzer Microservice:

# Run Audio Analyzer application container exposed on a randomly assigned port
docker run --rm -d -P -v $AUDIO_ANALYZER_DIR:/data -e http_proxy -e https_proxy -e ENABLED_WHISPER_MODELS -e DEFAULT_WHISPER_MODEL --name audioanalyzer intel/audio-analyzer:latest

Access the Audio-Analyzer API in a web browser on the URL given by this command:

host=$(ip route get 1 | awk '{print $7}')
port=$(docker port audioanalyzer 8000 | head -1 | cut -d ':' -f 2)
echo http://${host}:${port}/docs

API Usage#

Below are examples of how to use the API on command line with curl.

Health Check#

curl "http://localhost:$port/api/v1/health"

Get Available Models#

curl "http://localhost:$port/api/v1/models"

Filesystem Storage Examples#

Upload a Video File for Transcription#

curl -X POST "http://localhost:$port/api/v1/transcriptions" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/path/to/your/video.mp4" \
  -F "include_timestamps=true" \
  -F "device=cpu" \
  -F "model_name=small.en"

Get Transcripts from Local Filesystem#

Once the transcription process is completed, the transcript files will be available in the directory set by AUDIO_ANALYZER_DIR variable. We can check the transcripts as follows:

ls $AUDIO_ANALYZER_DIR/transcript

Transcription Performance and Optimization on CPU#

The service uses pywhispercpp with the following optimizations for CPU transcription:

Multithreading: Automatically uses the optimal number of threads based on your CPU cores
Parallel Processing: Utilizes multiple CPU cores for audio processing
Greedy Decoding: Faster inference by using greedy decoding instead of beam search
OpenVINO IR Models: Can download and use OpenVINO IR models for even faster CPU inference

Manual Host Setup using Poetry#

NOTE : This is an advanced setup and is recommended for development/contribution only. As an alternative method to setup on host, please see : setting up on host using setup script. When setting up on host, the default storage backend would be local filesystem. Please make sure STORAGE_BACKEND is not overridden to minio, unless you want to explicitly use the Minio backend.

Clone the repository and change directory to the audio-analyzer microservice:

# Clone the release branch
git clone https://github.com/open-edge-platform/edge-ai-libraries.git -b release-2025.2.0
# Access the code
cd edge-ai-libraries/microservices/audio-analyzer

Install Poetry if not already installed.
```
pip install poetry==1.8.3
```

Configure poetry to create a local virtual environment.

poetry config virtualenvs.create true
poetry config virtualenvs.in-project true

Install dependencies:
```
poetry lock --no-update
poetry install
```
Set comma-separated list of whisper models that need to be enabled:
```
export ENABLED_WHISPER_MODELS=small.en,tiny.en,medium.en
```

Set directories on host where models will be downloaded:

export GGML_MODEL_DIR=/tmp/audio_analyzer_model/ggml
export OPENVINO_MODEL_DIR=/tmp/audio_analyzer_model/openvino

Run the service:

DEBUG=True poetry run uvicorn audio_analyzer.main:app --host 0.0.0.0 --port 8000 --reload

(Optional): To run the service with Minio storage backend, make sure Minio Server is running. Please see Running a Local Minio Server. User might need to update the MINIO_ENDPOINT environment variable depending on where the Minio Server is running (if not set, default value considered is localhost:9000).
```
export MINIO_ENDPOINT="<minio_host>:<minio_port>"
```
Run the Audio Analyzer application on host:
```
STORAGE_BACKEND=minio DEBUG=True poetry run uvicorn audio_analyzer.main:app --host 0.0.0.0 --port 8000 --reload
```

Running Tests#

We can run unit tests and generate coverage by running following command in the application’s directory (microservices/audio-analyzer) in the cloned repo:

poetry lock --no-update
poetry install --with dev
# set a required env var to set model name : required due to compliance issue
export ENABLED_WHISPER_MODELS=tiny.en

# Run tests
poetry run coverage run -m pytest ./tests

# Generate Coverage report
poetry run coverage report -m

API Documentation#

When running the service, you can access the Swagger UI documentation at:

http://localhost:8000/docs

Advanced Setup Options#

Manually Running a Local MinIO Server#

If you’re not using the bundled Docker Setup script setup_docker.sh and still want to use the application with Minio storage, you can manually run a local MinIO server using:

docker run -d -p 9000:9000 -p 9001:9001 --name minio \
  -e MINIO_ROOT_USER=${MINIO_ACCESS_KEY} \
  -e MINIO_ROOT_PASSWORD=${MINIO_SECRET_KEY} \
  -v minio_data:/data \
  minio/minio server /data --console-address ':9001'

You can then access the MinIO Console at http://localhost:9001 with these credentials:

Username: <MINIO_ACCESS_KEY>
Password: <MINIO_SECRET_KEY>

When to use Filesystem vs. MinIO backend#

Use Filesystem backend when (Default for standalone setup on host):

Running in a simple, single-node deployment
No need for distributed/scalable storage
No integration with other services that might need to access transcripts
Running in resource-constrained environments

Use MinIO backend when (Default for setup using Docker script):

Running in a containerized/cloud environment
Need for scalable, distributed object storage
Integration with other services that need to access transcripts
Building a clustered/distributed system
Need for better data organization and retention policies

Next Steps#

Troubleshooting#

Docker Container Fails to Start:
- Run docker logs {{container-name}} to identify the issue.
- Check if the required port is available.
Cannot Access the Microservice:
- Confirm the container is running:
```
docker ps
```