# Get Started The **Audio Analyzer microservice** enables developers to create speech transcription from video files. This section provides step-by-step instructions to: - Set up the microservice using a pre-built Docker image for quick deployment. - Run predefined tasks to explore its functionality. - Learn how to modify basic configurations to suit specific requirements. # Prerequisites Before you begin, ensure the following: - **System Requirements**: Verify that your system meets the [minimum requirements](./system-requirements.md). - **Docker Installed**: Install Docker. Make sure the `docker` command can be run without `sudo`. For installation instructions, see [Get Docker](https://docs.docker.com/get-docker/). This guide assumes basic familiarity with Docker commands and terminal usage. If you are new to Docker, see [Docker Documentation](https://docs.docker.com/) for an introduction. # Configurations ## Environment Variables The following environment variables can be configured: - `UPLOAD_DIR`: Directory for uploaded files (default: /tmp/audio-analyzer/uploads) - `OUTPUT_DIR`: Directory for transcription output (default: /tmp/audio-analyzer/transcripts) - `ENABLED_WHISPER_MODELS`: Comma-separated list of Whisper models to enable and download - `DEFAULT_WHISPER_MODEL`: Default Whisper model to use if a model name is not provided explicitly (default: tiny.en or first model from ENABLED_WHISPER_MODELS list, if tiny.en is not available) - `GGML_MODEL_DIR`: Directory for downloading GGML models (for CPU inference) - `OPENVINO_MODEL_DIR`: Directory for storing OpenVINO optimized models (for GPU inference) - `LANGUAGE`: Language code for transcription (default: None, auto-detect) - `MAX_FILE_SIZE`: Maximum allowed file size in bytes (default: 100MB) - `DEFAULT_DEVICE`: Device to use for transcription - 'cpu', 'gpu', or 'auto' (default: cpu) - `USE_FP16`: Use half-precision (FP16) for GPU inference (default: True) - `STORAGE_BACKEND`: Storage backend to use - 'minio' or 'filesystem'. **MinIO Configuration** - `MINIO_ENDPOINT`: MinIO server endpoint (default: `minio:9000` in Docker setup script) - `MINIO_ACCESS_KEY`: MinIO access key used as login username - `MINIO_SECRET_KEY`: MinIO secret key used as login password ## Setup the Storage backends The service supports two storage backends for source video files and transcript output: - **MinIO** : Store transcripts in a MinIO bucket. (Default value when Docker setup script is used) - **Filesystem**: Store transcripts on the local filesystem. The API service will not have any external storage dependency. (Default value when application runs in [standalone mode](#standalone-setup-in-docker-container).) The Docker setup script `setup_docker.sh` has **minio** as default storage backend. You can override the default value by setting `STORAGE_BACKEND` environment variable: For Minio Storage: ```bash export STORAGE_BACKEND=minio ``` For Local filesystem storage: ```bash export STORAGE_BACKEND=local ``` On the other hand, the host setup script `setup_host.sh` uses **local** filesystem as the only storage backend available. ## MinIO integration The service supports MinIO object storage integration for: 1. **Video Source**: Fetch videos from a MinIO bucket instead of direct uploads 2. **Transcript Storage**: Store transcription outputs (SRT/TXT) in a MinIO bucket ### MinIO Configuration To use MinIO integration, you need to configure the following environment variables: ```bash # MinIO server connection export MINIO_ACCESS_KEY= export MINIO_SECRET_KEY= ``` ## Models Selection Refer to [supported models](./Overview.md#models-supported) for the list of models that can be used for transcription. You can specify which models to enable through the `ENABLED_WHISPER_MODELS` environment variable. # Quick Start User has following different options to start and use the application : - [Build the image and run using Docker script](./how-to-build-from-source.md#build-and-run-using-docker-script). Docker script helps build images for application and any required dependency and deploy the application. Default storage backend used here is `minio` but can be updated to use `local` storage backend. - [Use pre-built image for standalone setup](#standalone-setup-in-docker-container). Standalone setup has no external dependency. Default and recommended storage backend: `local`. - [Build and setup on host using setup script](./how-to-build-from-source.md#setup-and-run-on-host-using-setup-script). Only storage backend available: `local` - [Build and setup on host manually](#manual-host-setup-using-poetry). Default storage backend used is `local` but can be configured to use `minio` storage backend. ## Standalone Setup in Docker Container 1. Set the registry and tag for the public image to be pulled. ```bash export REGISTRY=intel/ export TAG=1.3.1 ``` 2. Pull public image for Audio Analyzer Microservice: ```bash docker pull ${REGISTRY}audio-analyzer:${TAG:-latest} ``` 3. Set the required environment variables: ```bash export ENABLED_WHISPER_MODELS=small.en,tiny.en,medium.en ``` 4. Set and create the directory in filesystem where transcripts will be stored: ```bash export AUDIO_ANALYZER_DIR=~/audio_analyzer_data mkdir $AUDIO_ANALYZER_DIR ``` 5. Stop any existing Audio-Analyzer container (if any): ```bash docker stop audioanalyzer ``` 6. Run the Audio-Analyzer Microservice: ```bash # Run Audio Analyzer application container exposed on a randomly assigned port docker run --rm -d -P -v $AUDIO_ANALYZER_DIR:/data -e http_proxy -e https_proxy -e ENABLED_WHISPER_MODELS -e DEFAULT_WHISPER_MODEL --name audioanalyzer intel/audio-analyzer:latest ``` 7. Access the Audio-Analyzer API in a web browser on the URL given by this command: ```bash host=$(ip route get 1 | awk '{print $7}') port=$(docker port audioanalyzer 8000 | head -1 | cut -d ':' -f 2) echo http://${host}:${port}/docs ``` ## API Usage Below are examples of how to use the API on command line with `curl`. ### Health Check ```bash curl "http://localhost:$port/api/v1/health" ``` ### Get Available Models ```bash curl "http://localhost:$port/api/v1/models" ``` ### Filesystem Storage Examples #### Upload a Video File for Transcription ```bash curl -X POST "http://localhost:$port/api/v1/transcriptions" \ -H "Content-Type: multipart/form-data" \ -F "file=@/path/to/your/video.mp4" \ -F "include_timestamps=true" \ -F "device=cpu" \ -F "model_name=small.en" ``` #### Get Transcripts from Local Filesystem Once the transcription process is completed, the transcript files will be available in the directory set by `AUDIO_ANALYZER_DIR` variable. We can check the transcripts as follows: ```bash ls $AUDIO_ANALYZER_DIR/transcript ``` ## Transcription Performance and Optimization on CPU The service uses **pywhispercpp** with the following optimizations for CPU transcription: - **Multithreading**: Automatically uses the optimal number of threads based on your CPU cores - **Parallel Processing**: Utilizes multiple CPU cores for audio processing - **Greedy Decoding**: Faster inference by using greedy decoding instead of beam search - **OpenVINO IR Models**: Can download and use OpenVINO IR models for even faster CPU inference # Manual Host Setup using Poetry > **__NOTE :__** This is an advanced setup and is recommended for development/contribution only. As an alternative method to setup on host, please see : [setting up on host using setup script](./how-to-build-from-source.md#setup-and-run-on-host-using-setup-script). When setting up on host, the default storage backend would be local filesystem. Please make sure `STORAGE_BACKEND` is not overridden to **minio**, unless you want to explicitly use the Minio backend. 1. Clone the repository and change directory to the audio-analyzer microservice: ```bash # Clone the latest on mainline git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries # Alternatively, Clone a specific release branch git clone https://github.com/open-edge-platform/edge-ai-libraries.git edge-ai-libraries -b # Access the code cd edge-ai-libraries/microservices/audio-analyzer ``` 2. Install Poetry if not already installed. ```bash pip install poetry==1.8.3 ``` 3. Configure poetry to create a local virtual environment. ```bash poetry config virtualenvs.create true poetry config virtualenvs.in-project true ``` 4. Install dependencies: ```bash poetry lock --no-update poetry install ``` 5. Set comma-separated list of whisper models that need to be enabled: ```bash export ENABLED_WHISPER_MODELS=small.en,tiny.en,medium.en ``` 6. Set directories on host where models will be downloaded: ```bash export GGML_MODEL_DIR=/tmp/audio_analyzer_model/ggml export OPENVINO_MODEL_DIR=/tmp/audio_analyzer_model/openvino ``` 7. Run the service: ```bash DEBUG=True poetry run uvicorn audio_analyzer.main:app --host 0.0.0.0 --port 8000 --reload ``` 8. _(Optional):_ To run the service with Minio storage backend, make sure Minio Server is running. Please see [Running a Local Minio Server](#manually-running-a-local-minio-server). User might need to update the `MINIO_ENDPOINT` environment variable depending on where the Minio Server is running (if not set, default value considered is `localhost:9000`). ```bash export MINIO_ENDPOINT=":" ``` Run the Audio Analyzer application on host: ```bash STORAGE_BACKEND=minio DEBUG=True poetry run uvicorn audio_analyzer.main:app --host 0.0.0.0 --port 8000 --reload ``` ## Running Tests We can run unit tests and generate coverage by running following command in the application's directory (microservices/audio-analyzer) in the cloned repo: ```bash poetry lock --no-update poetry install --with dev # set a required env var to set model name : required due to compliance issue export ENABLED_WHISPER_MODELS=tiny.en # Run tests poetry run coverage run -m pytest ./tests # Generate Coverage report poetry run coverage report -m ``` ## API Documentation When running the service, you can access the Swagger UI documentation at: ```bash http://localhost:8000/docs ``` ## Advanced Setup Options ### Manually Running a Local MinIO Server If you're not using the bundled Docker Setup script `setup_docker.sh` and still want to use the application with Minio storage, you can manually run a local MinIO server using: ```bash docker run -d -p 9000:9000 -p 9001:9001 --name minio \ -e MINIO_ROOT_USER=${MINIO_ACCESS_KEY} \ -e MINIO_ROOT_PASSWORD=${MINIO_SECRET_KEY} \ -v minio_data:/data \ minio/minio server /data --console-address ':9001' ``` You can then access the MinIO Console at http://localhost:9001 with these credentials: - **Username**: - **Password**: ### When to use Filesystem vs. MinIO backend Use **Filesystem** backend when (Default for standalone setup on host): - Running in a simple, single-node deployment - No need for distributed/scalable storage - No integration with other services that might need to access transcripts - Running in resource-constrained environments Use **MinIO** backend when (Default for setup using Docker script): - Running in a containerized/cloud environment - Need for scalable, distributed object storage - Integration with other services that need to access transcripts - Building a clustered/distributed system - Need for better data organization and retention policies ## Next Steps ## Troubleshooting 1. **Docker Container Fails to Start**: - Run `docker logs {{container-name}}` to identify the issue. - Check if the required port is available. 2. **Cannot Access the Microservice**: - Confirm the container is running: ```bash docker ps ``` ## Supporting Resources * [Overview](Overview.md) * [API Reference](api-reference.md) * [System Requirements](system-requirements.md)