System Design Document: Document Summarization Application#

Overview#

The Document Summarization Sample Application provides an end-to-end pipeline for summarizing documents using advanced AI models. It exposes a REST API and a user-friendly web UI, leveraging containerized microservices for scalability and maintainability.

Architecture Diagram#

The following figure shows the microservices required to implement the Document Summarization Sample Application.

Technical Architecture Diagram of Document Summarization

Components#

1. NGINX Web Server-based Reverse Proxy#

Role: Routes external requests to the appropriate backend service (API or UI).
Configuration: nginx.conf
Port: 8101 (external)

2. Gradio UI-based UI (`docsum-ui`)#

Role: Provides a web interface for users to upload documents and view summaries.
Implementation: ui/gradio_app.py
Port: 9998 (external, mapped to 7860 in container)
Depends on: docsum-api

3. DocSum Backend Platform (`docsum-api`)#

Role: Handles REST API requests for document summarization and file uploads; and orchestrates the summarization pipeline.
Implementation: app/server.py
Port: 8090 (external, mapped to 8000 in container)
Depends on: OVMS Model Server (for LLM inference)

4. LLM on OpenVINO™ Model Server (`ovms-service`)#

Role: Serves AI models (e.g., LLMs) for inference.
Configuration: Loads models from a mounted volume.
Ports: 9300 (gRPC), 8300 (REST)

Data Flow#

User uploads a document through the UI.
UI sends the document to the DocSum backend platform (docsum-api) through REST API.
Backend processes the document (e.g., chunking and pre-processing).
Backend sends inference requests to the OpenVINO™ model server for summarization.
Summary is returned to the backend platform, which then sends it to the UI for display.

The following figure shows the data flow:

API call sequence

Deployment#

All services are containerized and orchestrated through docker-compose.yaml.
Services communicate over a shared Docker bridge network (my_network).
Environment variables are used for configuration and proxy settings.

Key Files#

docker-compose.yaml: Service orchestration.
app/server.py: FastAPI backend.
ui/gradio_app.py: Gradio UI.
nginx.conf: NGINX web server configuration.

Extensibility#

Model Flexibility: OpenVINO™ model server can serve different models by updating the model volume and configuration.
UI Customization: Gradio UI provides rich set of capabilities to customize the UI as per user preferences.
API Expansion: You can extend the FastAPI backend for more endpoints or pre and post-processing logic.

Security and Observability#

Security: You can configure the NGINX web server for SSL and TLS authentication.
Logging: Each service logs to stdout and stderr for Docker log aggregation.
Healthchecks: OpenVINO™ model server and API services have healthchecks defined in Docker Compose tool.

References#

README.md
config.py (for environment/config management)