Skip to content

Architecture

Video Intelligence Agent is a cloud-native agent deployed on Google Cloud Run, built to handle large numbers of concurrent video analysis requests without performance degradation.


System Overview

graph TB
    User(["User / Client"])

    subgraph VideoIntelligenceAgent["Video Intelligence Agent  (Google Cloud Run)"]
        Server["HTTP/2 Server"]
        Protocol["A2A Protocol Layer"]
        TaskMgr["Task Manager"]
        Executor["BDD Agent Executor"]
        CoreAgent["BDD Generator Agent"]
    end

    GeminiAPI(["Google Gemini AI"])

    User -->|"Video + optional text context"| Server
    Server --> Protocol
    Protocol --> TaskMgr
    Protocol --> Executor
    TaskMgr -->|"Tracks task state"| Executor
    Executor --> CoreAgent
    CoreAgent -->|"Uploads video & requests BDD generation"| GeminiAPI
    GeminiAPI -->|"Structured BDD JSON"| CoreAgent
    CoreAgent -->|"Feature files + summary"| Executor
    Executor -->|"Real-time streaming updates"| User

Components

HTTP/2 Server

Video Intelligence Agent runs on an HTTP/2-native web server. HTTP/2 allows multiple streams to flow over a single connection simultaneously, which is essential for delivering real-time SSE (Server-Sent Events) streams — one per status update and one per generated feature file — without any blocking.

A2A Protocol Layer

This layer implements the Agent-to-Agent (A2A) Protocol v1.0. It is responsible for:

  • Advertising the agent's identity and capabilities via the Agent Card
  • Accepting structured requests from any A2A-compatible client
  • Routing each request to the appropriate handler

Task Manager

Every request is tracked as an independent task. The task manager maintains the lifecycle state of each request — from the moment it is received until a result is returned or an error is reported. Because each video analysis is completely self-contained, tasks are ephemeral and do not persist across requests.

BDD Agent Executor

The executor bridges the A2A protocol to the actual generation logic. It:

  • Extracts the video and any optional text from the incoming request
  • Emits real-time status updates as the task progresses
  • Orchestrates the call to the core agent
  • Packages each generated feature file as a deliverable artifact
  • Handles failure scenarios gracefully

BDD Generator Agent

The core intelligence of Video Intelligence Agent. It:

  1. Uploads your video to Gemini's File API for processing
  2. Asks Gemini to analyze the video and produce structured BDD output
  3. Validates and parses the response
  4. Cleans up the uploaded video once processing is complete

Streaming Flow

Video Intelligence Agent streams results back to the caller in real time via Server-Sent Events (SSE). You receive updates progressively rather than waiting for the entire generation to complete.

sequenceDiagram
    participant User
    participant VideoIntelligenceAgent as Video Intelligence Agent
    participant Gemini as Gemini AI

    User->>Video Intelligence Agent: Request (video + optional context)
    Video Intelligence Agent-->>User: Task received
    Video Intelligence Agent-->>User: Status — Analyzing video…
    Video Intelligence Agent->>Gemini: Upload video
    Video Intelligence Agent->>Gemini: Generate BDD test cases
    Gemini-->>Video Intelligence Agent: Structured output
    Video Intelligence Agent-->>User: Status — Generating feature files…
    Video Intelligence Agent-->>User: Artifact — authentication/login.feature
    Video Intelligence Agent-->>User: Artifact — checkout/payment.feature
    Video Intelligence Agent-->>User: Artifact — summary.json
    Video Intelligence Agent-->>User: Completed

Request Lifecycle

flowchart LR
    A(["Incoming Request"]) --> B["A2A Protocol Layer"]
    B --> C["BDD Agent Executor"]
    C --> D{"Video present?"}
    D -- No --> E(["Failed: No video found"])
    D -- Yes --> F["Upload to Gemini"]
    F --> G["Generate BDD content"]
    G --> H["Emit feature artifacts"]
    H --> I["Emit summary"]
    I --> J(["Completed"])

Error Handling

Video Intelligence Agent handles all failure modes gracefully and returns a clear status message to the caller:

Failure User-facing Message
No video in request Prompt to provide a video file
Gemini API error Suggestion to check credentials or quota
Unexpected AI response Suggestion to retry the request
Network / IO error Suggestion to check connectivity and retry