# GenAI Use Case This guide walks you through the **Video Summarization VLM** predefined pipeline. It uses the `gvagenai` DL Streamer element together with a vision-language model (VLM) to generate concise, scene-level natural-language summaries from sampled frames of an input video. Unlike the classic detection/classification pipelines, this pipeline produces *metadata-only* output (JSON Lines) — there is no rendered output video. ## Step 1. Navigate to the predefined pipeline 1. Open the ViPPET UI and go to the **Pipelines** view from the left navigation. 2. Locate the **Video Summarization VLM** tile in the pipeline grid. It is identifiable by its **GenAi** tag badge shown on the card. 3. Click the tile (or one of its variant badges) to open it in the **Pipeline Builder**. The pipeline ships with three variants — **CPU**, **GPU**, and **NPU** — all pre-configured with the same OpenVINO model. They differ only in the target inference device and in the corresponding pre-converted OpenVINO model directory path used for that variant. Select the variant matching the hardware you want to benchmark. ## Step 2. Configure the GVAGenAI element In the Pipeline Builder, click the **gvagenai** node to open its configuration panel. ![GenAI Properties](../../_assets/ViPPET-UI-GenAI-Props-light.png) The following parameters are exposed in the UI (defaults shown reflect the predefined pipeline): | Parameter | Default | Description | | --------------------- | ----------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **model** | `google/gemma-3-4b-it` (INT4) | The vision-language model used for summarization. Only models on disk tagged for GenAI are listed. The on-disk path is resolved automatically from the selected variant (CPU/GPU/NPU). | | **device** | `CPU` / `GPU` / `NPU` | Target inference device. Set automatically by the selected variant; use the variant switcher to change devices rather than editing this field directly. | | **prompt** | `"Summarize this video in one sentence."` | Instruction sent to the VLM for each chunk of sampled frames. Edit it to control the style, length, or focus of the generated summaries (for example, *"List the main activities visible in the scene."*). | | **generation-config** | `max_new_tokens=64` | Generation-config controls for the VLM, expressed as a comma-separated `key=value` list (e.g. `max_new_tokens=128,temperature=0.7`). Larger values produce longer summaries at the cost of latency. | | **frame-rate** | `1` | Number of frames per second sampled from the decoded video and fed to the VLM. Lower values reduce compute; higher values capture more temporal detail. | | **chunk-size** | `4` | Number of sampled frames grouped into a single VLM inference call. One summary entry is emitted per chunk. | | **metrics** | `false` | When `true`, the element emits per-inference timing metrics alongside the summary metadata. | The downstream `gvametapublish` and `gvafpscounter` nodes are responsible for writing the JSON Lines output and reporting FPS, respectively. :::info[Note] **Note:** The Video Summarization VLM pipeline is *metadata-only* — it terminates in an unnamed `fakesink` and does not produce a rendered output video. The **Save to file** and **Live stream** output modes therefore have no effect for this pipeline; only the JSON Lines metadata file is generated. ::: ## Step 3. Run the pipeline 1. Confirm that the input video is available under the shared `videos/input/` directory (the default pipeline uses `people.mp4`). 2. Click **Run**. ViPPET launches the pipeline as a job; you can follow progress in the **Jobs** view. 3. While the job runs, the selected device's utilization (CPU/GPU/NPU) should increase visibly in the **Dashboard**. ## Step 4. Interpret the results When the job completes, two outputs are available: - **Scene summaries (JSON Lines):** the VLM writes one record per processed chunk to `videos/output/summary.jsonl` in the shared volume. Each line is a JSON object whose `summary` (or equivalent) field contains the generated text for that chunk, together with frame/timestamp information. - **Throughput (FPS):** the `gvafpscounter` element reports the steady-state processing rate after a short warm-up (the first 10 frames are skipped via `starting-frame=10`). The number is visible in the job logs and in the **Performance** results view. ![GenAI Results](../../_assets/ViPPET-UI-GenAI-Results-light.png) To evaluate the pipeline across hardware, re-run it with a different variant selected and compare the reported FPS and generated summaries.