Inference Platform / VelaIris-DM
02.5 / Inference Platform Production Unlocks the revenue

VelaIris-DM

Descriptive metadata at archive depth.

VelaIris-DM produces descriptive metadata at the depth archive and search systems actually need. Four modalities — computer vision, ASR, OCR and structured reasoning — converge on a single coherent descriptive record per asset, structured for ingestion into MAM, DAM or proprietary archives, on the customer’s own taxonomy.

Who it's for

For the people who make the archive findable.

Archive / Library Lead

Searchable narrative, library-wide.

Narrative metadata across the whole library — find any segment by describing it in plain language.

MAM / DAM Owner

Records that ingest as-is.

Records that drop into existing systems on the existing taxonomy — no re-platforming.

Content Discovery / Monetization

Archive as inventory.

Turn an archive from cost center into discoverable, monetizable inventory.

What it is

Four modalities, one record per asset.

VelaIris-DM transforms broadcast and streaming video into rich, structured and narrative metadata through a multi-modal content-understanding pipeline — fusing computer vision, ASR, OCR and neural-network narrative synthesis into coherent, flowing descriptions rather than disconnected keyword lists. The result is a fully searchable library: find any segment by describing it in plain language.

Visual understanding covers scene segmentation, location, activity, objects and on-screen graphics; audio adds speech recognition, speaker diarization and word-level transcript alignment; narrative synthesis writes segment-level descriptions and program summaries with cross-segment continuity. It runs entirely on-premises — content never leaves the facility — and output schemas map to the customer’s own taxonomy.

Four-modality fusionVision + ASR + OCR + narrative synthesis
Narrative, not keywordsCoherent prose with cross-segment continuity
On-prem, zero egressContent never leaves the facility
Customer taxonomySchemas map to existing MAM / DAM
Capabilities

Understanding across every modality.

Visual, audio, on-screen text and narrative synthesis, fused into one searchable record per asset.

01

Visual understanding

Scene segmentation and classification, location and activity recognition, object detection and graphics/logo identification.

ScenesObjectsGraphics
02

Audio understanding

High-accuracy speech recognition, speaker diarization and word-level transcript-to-video alignment, with language detection.

ASRDiarizationAlignment
03

Narrative synthesis

Segment-level descriptions and program summaries with cross-segment continuity and configurable verbosity — prose, not keyword lists.

Segment proseSummaries
04

On-screen text (OCR)

Graphics, lower-thirds, chyrons, signage, tickers and crawls captured and indexed.

ChyronsTickers
05

Semantic search

Natural-language search across narrative, tags and transcripts — archive exploration by topic, date, entity or program.

Natural-languageLow-latency
06

Rights & accessibility

Rights-management support and accessibility-compliance metadata framed around CEA-608/708 caption support.

Rights flagsCEA-608/708

Output is one searchable narrative record per asset — continuous segment descriptions plus program summaries, hierarchical topic and entity tags, temporal markers, content classification, rights and compliance flags, exported as JSON, XML or a custom schema.

What it produces

Four modalities, one record.

Visual, audio, on-screen text and structured reasoning fuse into a single descriptive record — segment narrative, summaries and structured metadata.

4
Modalities fused per record
91.5%
Composite accuracy verify
2–4×
Faster than real-time verify
~17 ms
Semantic-search latency verify
Visual
  • Scene segmentation
  • Location & activity
  • Object detection
  • Graphics / logo ID
Audio
  • Speech recognition
  • Speaker diarization
  • Word-level alignment
  • Language detection
Narrative
  • Segment descriptions
  • Program summaries
  • Cross-segment continuity
  • Configurable verbosity
Structure
  • Hierarchical topic tags
  • Entity tags
  • Rights & compliance flags
  • Confidence scores
Input

Any format, batch or real-time.

Live, file-based or streaming — SD and HD, common codecs, multi-language audio.

Live
Standard transport
File
Network storageArchiveWatch folder
Streaming
HLSDASHOTTFAST
Export
JSONXMLCustom
The operator interface

A web-based descriptive workspace.

Named surfaces for description, search and integration.

Live

Multiviewer

Split-screen live video with synchronized descriptive-metadata cards.

Search

Semantic Search

Natural-language search across the whole library.

Read

Narrative Viewer

Segment descriptions synced to playback.

Browse

Metadata Browser

Structured tags, entities and classifications per asset.

Explore

Archive Explorer

Exploration by topic, date, entity or program.

Integrate

Export & Integration

Schema-mapped delivery into MAM / DAM.

Deployment & integrations

On-prem appliance, zero content egress.

Deployment

A Vela-supplied GPU appliance — all processing in-facility, content never leaves.

On-prem GPU appliance — supplied to spec; all processing in-facility.
Zero content egress — content never leaves the facility.
Platform-as-a-Service — appliance, updates and support; no perpetual or upgrade fees.
Batch or real-time — SD & HD, common codecs, multi-language audio.

Integrations

Documented interfaces into existing archive systems.

REST API + WebSocket — to MAM, DAM and newsroom systems.
Configurable export schemas — JSON, XML or custom.
Watch folder — automated batch processing.
Accessibility — CEA-608/708 caption support.
How it works

Understand. Describe. Make findable.

VelaIris-DM turns raw assets into a searchable narrative archive — one coherent record at a time.

Understand

Four modalities

Vision, audio, on-screen text, reasoning.

Describe

One narrative record

Coherent prose plus structured metadata.

Discover

Semantic search

Find any segment in plain language.

Proven on real content

Validated on real broadcast content.

Accuracy and speed across visual, audio and on-screen text — on production broadcast material, not synthetic clips.

Accuracy verify

Composite 91.5% (ASR 94% / OCR 89% / visual 92% / narrative 91%), 2–4× faster than real-time.

Worked example verify

On a regional broadcast network’s ~87,000-hour archive: metadata fill 30%→85% within 90 days; clip research 25 minutes → 4 minutes.

Bring us a feed

See VelaIris-DM on your archive.

Engagements start with a signal. Bring a representative slice of the archive and we run it through the description pipeline — you see exactly what VelaIris-DM produces, before any commercial conversation.

01

Archive scoping

02

Description run

03

Librarian review

04

Deployment scoping

Start with a signal →

info@vela.com · (727) 507-5300