Navigation

x-dream AI-Hub | AI Audio/Video Analysis | x-dream-media

x-dream media

x-dream AI Hub

Audio / Video Analysis, Search & Compositing AI

x-dream AI Hub is an AI software stack designed to analyse audio and video content at scale. It aggregates results into rich semantic content descriptions, enables powerful search across your entire media library, and creates new content from existing assets.

AI has been one of x-dream-Fabrik's core processing engines since 2024. x-dream AI Hub is the next step: our own dedicated AI development, designed to run as a standalone hub that any platform — ours or a partner's — can call over a clean API.

With a built-in API, x-dream AI Hub integrates seamlessly as part of the x-dream-Fabrik platform, as a standalone component, or as a module embedded in any third-party software. Software development partners can leverage its comprehensive API to power their own intelligent media applications.

Every analysis run processes your media through multiple AI engines in a single pipeline — from speech-to-text and speaker diarization, identification to face identification and contextual scene analysis — delivering a rich, multi-dimensional metadata layer over your entire content archive.

About

Use cAses

News Production

Journalists describe the story they want to tell in natural language. x-dream AI Hub identifies matching archive shots, generates a text and editing script with beats, and finally produces an EDL incl. voiceover for direct rendering or craft editing — all in minutes.

Archive Enrichment

Retroactively enrich legacy archives with semantic metadata. Every clip receives shot-level descriptions, transcripts, face IDs, and tags — making decades of footage instantly searchable.

MAM / PAM Integration

Expose the x-dream AI Hub API to your existing MAM or PAM system. Automatic metadata enrichment on ingest improves asset discoverability and reduces manual logging time by over 80%.

Sports & Events

Detect player faces, objects, and action moments across live recordings. Generate highlight clips and social media edits from event footage in near real-time using semantic search and story creation.

Multilingual Publishing

Language detection, transcription, and translation in a single pipeline. Produce localised captions, subtitles, and metadata in multiple languages from a single analysis run.

Compliance & Research

OCR extracts on-screen text; face and voice recognition cross-reference known persons. Used by broadcasters and research institutions for content compliance monitoring and corpus analysis.

Target Customers

Broadcaster

National TV
Regional TV
News channels
Special interest channels
Event channels (e.g. sports, news, entertainment)

Content Owner

Archive Ingest
Asset Aggregation
B2B content delivery

Post-Production Facilities

On-premises Editing
Distributed Production
Content Delivery

Media Groups

Archive Ingest
Event channels (e.g. sports, music, society)
Special interest channels

Localisation Agencies

Translation
Subtitling

Corporate & Public

Content Production
Cross Media Publishing
Archive Ingest
Business TV

Deployment

x-dream AI Hub is infrastructure-agnostic. Choose the deployment model that fits your security, latency, and cost requirements.

On-Premises

Full control and data sovereignty. Deploy x-dream AI Hub on your own hardware for maximum security and lowest latency on local storage.

Datacentre

Co-locate with your existing media infrastructure in a managed datacentre. Combine performance with reduced operational overhead.

Cloud

Scale elastically on AWS, Azure, or GCP. Pay-per-use GPU resources for burst workloads without capital expenditure on hardware.

Hybrid

Keep sensitive content on-premises while routing non-confidential workloads to cloud AI services via the built-in Cloud Connector.

Features

Core Capabilities

Audio & Video analysis pipeline
Shot-level scene detection & timeline
Face & voice identification across assets
OCR & overlay text extraction
LLM-powered contextual enrichment
Semantic content description aggregation
Natural language semantic search
AI-assisted story & content creation
RESTful API for third-party integration
Flexible deployment (cloud, on-prem, hybrid)

Metadata ↔ Semantic Synergy

In x-dream-Fabrik, manually entered metadata and AI-derived semantic data work both ways: an existing title or description can be fed to x-dream AI Hub as a hint, guiding the analysis and reducing false detections. In return, the AI-generated shot summary can become the synopsis in your standard metadata set. Both layers stay fully optional and independently usable.

Ai Engines

Language Detection

Automatically identifies the spoken or written language present in audio and video content, enabling correct routing to transcription and translation engines.

Transcription

Converts spoken audio to accurate timestamped text transcripts. Works across languages and supports broadcast-quality audio with background noise.

Translation

Translates transcribed text into target languages, enabling cross-language search and multilingual metadata generation from a single analysis run.

Diarization

Identifies and separates different speakers within an audio track, assigning speaker labels to transcript segments with timestamps for accurate attribution.

Voice Detection

Detects and matches voiceprints across audio tracks, confirming a speaker's identity independent of face recognition — useful for off-camera narration, archive audio, or phone and radio sources.

Object Detection

Detects and classifies objects, persons, and entities within video frames, enriching each shot with structured visual metadata for downstream search.

Image Captioning

Generates natural language descriptions for individual video frames and shots, creating human-readable summaries of visual content at the scene level.

Image Classification

Categorises frames using a rich taxonomy of scene types, locations, activities, and visual attributes, enabling faceted filtering in search interfaces.

Face Recognition

Identifies and clusters faces across all analysed assets, enabling person-based search and tracking of known individuals throughout a media library.

Contextual Analysis

Uses a Large Language Model to synthesise outputs from all other engines into coherent semantic descriptions, actions, places, and canonical tags per shot.

STORY Creation

Beats are the structure. The timeline is where you shape it.

A beat is not a sentence — it's a unit of narrative purpose: what it needs to achieve, what evidence supports it, and what visual material is available to tell it. x-dream AI Hub's Timeline Planner turns a one-line prompt into a structured, editable beat sequence, then keeps every beat connected to your archive as you shape the story.

Beat 1 · Capture attention

Purpose: Hook the viewer with the scale and excitement of the event.

Evidence: Event footage, on-screen text.

Visual needs: Opening aerial shot transitioning to a bustling hall, title card overlay.

Beat 2 · introduce stakeholders

Purpose: Show who is present and why they matter.

Evidence: Live event clips, voice-over narration, interview soundbites.

Visual needs: Medium shots of professionals networking, product demo close-ups.

① Prompt → Beats

Type a one-line brief — "create a news report about the summit" — and the system proposes a full beat sequence in seconds, each one already matched to relevant source assets from your archive.

② Edit on the timeline

Reorder, merge, split, or delete beats directly. Set editorial intent — goal, audience, platform, tone, risk level, constraints — once, and every beat respects it.

③ Approve → Script

Once beats are approved, full voiceover script and shot suggestions are generated automatically, per beat — ready for conversational refinement before a single clip is cut.

Analysis Pipeline Starter

Configure and launch multi-engine analysis pipelines per asset: face, transcription, shot detection, captioning, metadata, and speaker services in one run.

Job Monitor

Real-time monitoring of all analysis jobs with status, timestamps, and error reporting. Full traceability from ingest to completed semantic metadata.

Shots & Semantic Layer

Visual shots timeline with per-shot semantic metadata: faces, actions, places, objects, tags, weather, time of day, and transcript segments with speaker attribution.

Story Beats Planning

Describe the story you want to tell in natural language. Each beat carries its own purpose, supporting evidence, and visual needs — reorder, merge, or edit any beat directly on the timeline before a single clip is matched.

Script Generation

AI-generated scripts complete with voiceover text, scene directions, and shot descriptions. Refine conversationally — tell the chat "remove the second paragraph" and the script updates in place, no manual editing required.

Shot Assignment & EDL

Review, approve, and assign media clips to each script row. A built-in warning flags an asset already used elsewhere, so nothing gets duplicated by accident. Export the final Edit Decision List for direct import into your NLE or broadcast playout system.

Output Versioning

Every approved output is saved as a version. When new information comes in, update the story and generate a new version — the previous one stays intact and accessible, so breaking news never costs you your last edit.

The full editorial timeline — from brief to EDL

Every beat sits on the timeline in sequence. Each one is independent — edit one without disturbing the others. The approved timeline becomes the script row order, the shot assignment list, and the EDL in one step.

Each beat is independent

Edit, reorder, merge, or delete any beat without touching the others. The timeline adjusts automatically.

Editorial intent travels with the beats

Set goal, audience, platform, tone, and risk level once — every beat and every generated script line respects it automatically.

Clips matched per beat, not per story

Source assets are suggested at beat level from the semantic index — so every section of the story draws from the most relevant material, not a single broad search.

Approve → script → EDL in one chain

Once beats are approved, the full script generates automatically. Assign shots, refine conversationally, export the EDL — no rebuilding at any stage.

Semantic Search

Type a scene description in plain language. The LLM parses your intent into entities, locations, dates, and tags — then matches against the full semantic layer across your entire media library.

LLM Query Parsing

Natural language queries are parsed by an LLM to extract named entities, temporal references, locations, and semantic concepts before hitting the vector search index.

Hybrid Vector Search

Qdrant vector database combines dense semantic embeddings with sparse keyword matching for high-precision recall. Results carry a hybrid relevance score per shot.

Shot-Level Precision

Search results are returned at the individual shot level — not just asset level — with exact in/out timecodes, making results immediately usable in editorial and NLE workflows.

Specifications

x-dream AI Hub is a containerised microservices stack. Each AI service is independently deployable and scalable. Below are reference hardware configurations and supported capabilities.