Audio / Video Analysis, Search & Compositing AI
x-dream AI Hub is an AI software stack designed to analyse audio and video content at scale. It aggregates results into rich semantic content descriptions, enables powerful search across your entire media library, and creates new content from existing assets.
AI has been one of x-dream-Fabrik's core processing engines since 2024. x-dream AI Hub is the next step: our own dedicated AI development, designed to run as a standalone hub that any platform — ours or a partner's — can call over a clean API.
With a built-in API, x-dream AI Hub integrates seamlessly as part of the x-dream-Fabrik platform, as a standalone component, or as a module embedded in any third-party software. Software development partners can leverage its comprehensive API to power their own intelligent media applications.
Every analysis run processes your media through multiple AI engines in a single pipeline — from speech-to-text and speaker diarization, identification to face identification and contextual scene analysis — delivering a rich, multi-dimensional metadata layer over your entire content archive.
News Production
Journalists describe the story they want to tell in natural language. x-dream AI Hub identifies matching archive shots, generates a text and editing script with beats, and finally produces an EDL incl. voiceover for direct rendering or craft editing — all in minutes.
Archive Enrichment
Retroactively enrich legacy archives with semantic metadata. Every clip receives shot-level descriptions, transcripts, face IDs, and tags — making decades of footage instantly searchable.
MAM / PAM Integration
Expose the x-dream AI Hub API to your existing MAM or PAM system. Automatic metadata enrichment on ingest improves asset discoverability and reduces manual logging time by over 80%.
Sports & Events
Detect player faces, objects, and action moments across live recordings. Generate highlight clips and social media edits from event footage in near real-time using semantic search and story creation.
Multilingual Publishing
Language detection, transcription, and translation in a single pipeline. Produce localised captions, subtitles, and metadata in multiple languages from a single analysis run.
Compliance & Research
OCR extracts on-screen text; face and voice recognition cross-reference known persons. Used by broadcasters and research institutions for content compliance monitoring and corpus analysis.
x-dream AI Hub is infrastructure-agnostic. Choose the deployment model that fits your security, latency, and cost requirements.
Full control and data sovereignty. Deploy x-dream AI Hub on your own hardware for maximum security and lowest latency on local storage.
Co-locate with your existing media infrastructure in a managed datacentre. Combine performance with reduced operational overhead.
Scale elastically on AWS, Azure, or GCP. Pay-per-use GPU resources for burst workloads without capital expenditure on hardware.
Keep sensitive content on-premises while routing non-confidential workloads to cloud AI services via the built-in Cloud Connector.
In x-dream-Fabrik, manually entered metadata and AI-derived semantic data work both ways: an existing title or description can be fed to x-dream AI Hub as a hint, guiding the analysis and reducing false detections. In return, the AI-generated shot summary can become the synopsis in your standard metadata set. Both layers stay fully optional and independently usable.
Automatically identifies the spoken or written language present in audio and video content, enabling correct routing to transcription and translation engines.
Converts spoken audio to accurate timestamped text transcripts. Works across languages and supports broadcast-quality audio with background noise.
Translates transcribed text into target languages, enabling cross-language search and multilingual metadata generation from a single analysis run.
Identifies and separates different speakers within an audio track, assigning speaker labels to transcript segments with timestamps for accurate attribution.
Detects and matches voiceprints across audio tracks, confirming a speaker's identity independent of face recognition — useful for off-camera narration, archive audio, or phone and radio sources.
Detects and classifies objects, persons, and entities within video frames, enriching each shot with structured visual metadata for downstream search.
Generates natural language descriptions for individual video frames and shots, creating human-readable summaries of visual content at the scene level.
Categorises frames using a rich taxonomy of scene types, locations, activities, and visual attributes, enabling faceted filtering in search interfaces.
Identifies and clusters faces across all analysed assets, enabling person-based search and tracking of known individuals throughout a media library.
Uses a Large Language Model to synthesise outputs from all other engines into coherent semantic descriptions, actions, places, and canonical tags per shot.
x-dream AI Hub is a containerised microservices stack. Each AI service is independently deployable and scalable. Below are reference hardware configurations and supported capabilities.
Containers
Video Codes
Audio Codes
Images
+ 50 further languages via cloud connector
All x-dream AI Hub related documentation: