The Challenge

A manufacturing subsidiary within a large holding company had built a vast internal library of product demonstration and training videos showcasing its machinery in action.
Yet that valuable content was largely underused.

In the field, veteran salespeople often remembered that a video proved a specific machine capability — but couldn’t locate the right clip during a client meeting. The moment passed, and with it, the chance to close the deal.

At the same time, new hires faced the opposite problem. Training videos were dense, lengthy, and lacked context. It could take weeks of passive viewing before a salesperson understood a machine well enough to speak confidently about its features.

The core issue: the company had knowledge — but no way to access or interact with it efficiently.

What We Built

We developed a two-part AI solution using the same intelligent video backbone — one designed to empower both experienced salespeople and new team members.

Part 1: The Finder – In-the-Moment Sales Tool

The first application transformed the company’s existing video archive into a real-time, searchable sales asset.

  • Powered by TwelveLabs API, the system indexed visual, audio, and textual elements of every video, enabling true contextual search.
  • Sales reps could type natural language queries like “show the press handling corrugated steel” or “demonstrate the safety interlock sequence.”
  • The system instantly surfaced timestamped clips — with previews — that could be shown during meetings or shared as direct links with clients.
  • Each result combined speech-to-text, object recognition, and action detection, turning hours of footage into seconds of precision retrieval.

This gave field teams instant access to verified proof, allowing them to respond confidently in the moment — with visuals, not hypotheticals.

Part 2: The Expert – Sales Team Accelerator

The second use case extended the same technology into a training companion for onboarding.

  • New salespeople could “chat” directly with training videos through an interactive interface powered by GPT-5 Vision and Gemini 2.5-Flash.
  • They could pause any clip and ask natural questions:

    “What’s happening at 3:24?”
    “What pressure reading appears before he adjusts the lever?”
    “Describe the safety check he performs before startup.”
  • The AI interpreted both the video and transcript context to deliver immediate, detailed explanations.
  • The result: trainees no longer skimmed endless footage — they engaged with it dynamically, deepening understanding and confidence in a fraction of the time.

By combining TwelveLabs’ multimodal video search with large-language model reasoning, we created a unified tool that served both sales enablement and onboarding — two critical but previously disconnected needs.

The Challenges We Faced

  1. Visual comprehension fidelity.

    Most video-AI tools can transcribe speech but fail to interpret on-screen action. We addressed this by combining frame-level embeddings from TwelveLabs with object/action tagging, allowing the system to “see” what was happening — not just what was said.

  2. Latency in the field.

    Sales situations require immediacy. We optimized by pre-indexing high-value videos and caching vector queries on the edge, reducing response time from ~8 seconds to under 2.

  3. User experience simplicity.

    Both veteran reps and trainees needed something intuitive. We designed a single search bar interface that accepted natural questions and returned contextual results with timestamped previews — familiar to use, powerful underneath.

The Outcome

  • Instant Proof in the Field: Sales reps could locate and show relevant clips within seconds, strengthening credibility and shortening sales cycles.
  • Accelerated Onboarding: New hires reduced training time from several weeks to a few days by learning interactively instead of passively.
  • Knowledge Activation: Years of underutilized video assets became searchable, explainable, and reusable across teams.
  • Unified Architecture: The same engine now supports marketing, support, and internal documentation retrieval.

Before: Finding one relevant clip could take 10–15 minutes — often too late to use effectively.
After: Timestamped video answers surfaced in under 5 seconds, directly supporting live conversations or training moments.

The company effectively turned its static video library into an on-demand intelligence system — one that informs, trains, and sells simultaneously.