# Content Search Flow

The Content Search feature supports file upload and ingestion, multimodal search (text and
image queries), and Q&A over retrieved content. To enter the Content Search view, click the
**Content Search** button in the top navigation bar of the Smart Classroom main screen.

![Content Search Entry](./_assets/content-search-entry1.png)

The Content Search view page is split into two panels:

**Left Panel**

- **Upload** — Ingest files (videos, documents, images) into the vector database
- **Search & Q&A** — Query your uploaded content using text or image search, or ask natural-language questions
- **Results** — Display search results with type filtering, relevance scores, and content previews

**Right Panel**

- **Configuration Metrics** — Platform and software configuration for Content Search services
- **Resource Utilization** — Live monitoring of CPU, GPU, Memory, and Power utilization
- **Models** — Models used by Content Search, including VLM, Visual Embedding, Document Embedding, and Reranker

![Content Search Initial View](./_assets/content-search-entry2.png)


## Step 1: Upload Files

Click the upload area or drag-and-drop files into the drop zone.

![Upload Flow](./_assets/content-search-upload.png)

The annotated steps above:

1. **Drag & drop** files or click to browse
2. **Select files** using checkboxes to manage tags
3. **Add tags** to selected files (before upload)
4. **Video options** — toggle Summarize for MP4 files
5. **Search & Q&A tabs** — available after upload completes

Supported file formats:

| Type | Formats |
| :--- | :--- |
| Video | `.mp4` |
| Document | `.pdf`, `.docx`, `.doc`, `.pptx`, `.ppt`, `.xlsx`, `.xls`, `.txt`, `.html`, `.htm`, `.xml`, `.md` |
| Image | `.jpg`, `.jpeg`, `.png` |

### Tagging Files

Before uploading, you can add tags to organize your content:

1. Select one or more files using the checkboxes in the file table
2. Type a tag in the tag input field and press **Enter** or **comma** to add it
3. Tags appear as chips that can be removed by clicking **×**

> **Note:** Tags can only be added or removed while a file is in the **Staged** state (before upload). Once uploaded, tags are locked.

### Video Summarization Toggle

For `.mp4` files, a **Summarize** toggle appears next to the file name. When enabled, the system uses a Vision Language Model (VLM) to generate text summaries of video chunks, enabling richer text-based search over video content.

### Uploading

Click the **Upload Files** button at the bottom to start processing all staged files. Each file goes through the ingestion pipeline:

- **Documents** — Text extraction (with OCR for handwritten/scanned content), semantic chunking, and embedding
- **Images** — CLIP embedding for visual similarity search
- **Videos** — Time-based chunking, frame sampling, VLM summarization (if enabled), and both text and visual embedding

The status column shows the current state: Staged → Processing → Completed (or Failed).

![Files Uploaded Successfully](./_assets/content-search-metrics.png)

### File Manager

Once files have been uploaded, click **View Files** to open the File Manager, which shows all files currently stored on the server.

![View Files Button](./_assets/content-search-filemanager.png)

![File Manager — Uploaded File List](./_assets/content-search-filemanager2.png)

## Step 2: Search

After at least one file upload is complete, the **Search** tab becomes available.

### Text Search

1. Select the **Text Search** tab
2. Type your query in the text area (max 100 characters)
3. Select the content types to search across: **Documents**, **Images**, **Videos** (any combination)
4. Optionally filter results by tag using the **Filter by tags** dropdown
5. Set the number of **Top Results** to return (default: 10)
6. Click **Search**

Text queries search both the visual collection (CLIP embeddings) and the textual collection (BGE embeddings). Textual results are reranked by a cross-encoder, and results from both modalities are merged using Reciprocal Rank Fusion (RRF).

![Text Search with Results](./_assets/content-search_searchResult.png)

### Image Search

1. Select the **Image Search** tab
2. Drag-and-drop an image or click to browse (accepts `.jpg`, `.jpeg`, `.png`)
3. Select the content types to search: **Images**, **Videos** (document search is not available for image queries)
4. Optionally filter by tag
5. Click **Search**

Image queries search the visual collection by CLIP similarity, returning visually similar images and video frames.

![Image Search with Results](./_assets/content-search-image-search.png)

### Search Results

Results are displayed in a card layout with tabs for filtering by type: **All**, **Documents**, **Images**, **Videos**.

Each result card shows:

- **File name** and type icon
- **Relevance score** (percentage)
- **Page number** (for documents)
- **Timestamp** (for video results, showing the pin time in the video)
- **Raw text / Summary** — expandable text snippet or VLM-generated summary
- **Tags** — associated labels

Click **Reset** to clear all search inputs and results.

## Step 3: Q&A (RAG)

The **Q&A** tab provides a conversational interface for asking questions about your uploaded content.

1. Switch to the **Q&A** tab
2. Optionally select tags to narrow the context using the **Filter by label** selector
3. Type your question in the input area (max 500 characters)
4. Press **Enter** or click the send button

The system retrieves the most relevant chunks from your uploaded content, assembles them as context, and sends them to the VLM to generate a grounded answer. Each response includes:

- **Answer** — The AI-generated response based on your content
- **Sources** — Referenced files with type indicators and location (page number or video timestamp)

The conversation history is maintained within the session, allowing multi-turn follow-up questions. Click **Clear conversation** to reset the chat history.

![Q&A Conversation](./_assets/content-search-QaA.png)

## Step 4: Health Monitoring

The Content Search panel automatically checks the health of backend services on load. If any service is unreachable or unhealthy, an error message appears indicating:

- Backend unreachable — the Content Search API (port 9011) is not responding
- Upload/search failure — one or more downstream services (File Ingest, Video Preprocess, VLM Serving, ChromaDB) have issues

Upload and search functionality is affected until all services are healthy.

![Health Check Error](./_assets/content-search-healthy-check.png)

## Microservices

| Service | Port | Role |
| :--- | :---: | :--- |
| Content Search API | 9011 | Orchestrator and public API |
| File Ingest & Retrieve | 9990 | Embedding, indexing, and retrieval |
| Video Preprocess | 8001 | Video chunking and VLM summarization |
| VLM OpenVINO Serving | 9900 | Vision-language model inference |
| ChromaDB | 9090 | Vector database |

## Learn More

- [How It Works — Content Search Pipeline](./how-it-works.md#content-search-pipeline): Technical architecture and design details.
- [Application Flow](./application-flow.md): End-to-end application flow.