Runway: Multimodal Feature Stores and Research-Engineering Collaboration

LLMOps Database

Media & Entertainment

Runway

Company

Runway

Title

Multimodal Feature Stores and Research-Engineering Collaboration

Industry

Media & Entertainment

Link

https://www.youtube.com/watch?v=wBYMiEuOJTQ

Year

2024

Summary (short)

Runway, a leader in generative AI for creative tools, developed a novel approach to managing multimodal training data through what they call a "multimodal feature store". This system enables efficient storage and retrieval of diverse data types (video, images, text) along with their computed features and embeddings, facilitating large-scale distributed training while maintaining researcher productivity. The solution addresses challenges in data management, feature computation, and the research-to-production pipeline, while fostering better collaboration between researchers and engineers.

Tags

# Runway's Approach to Multimodal AI and MLOps ## Company Background Runway is a pioneering company in generative AI for creative tools, particularly known for their work in video generation and editing. As one of the creators of Stable Diffusion, they've expanded their focus to building powerful creative tools that operate entirely in the browser, offering capabilities like video generation from text, image animation, and sophisticated video editing features powered by AI. ## The Challenge of Multimodal Data Management - Traditional feature stores focus on tabular data and low-latency serving - Multimodal AI introduces new challenges with diverse data types: - Data size and complexity make traditional approaches impractical ## The Multimodal Feature Store Solution ### Key Components - Centralized storage for diverse data types - Support for pre-computed features and embeddings - Vector search capabilities for semantic queries - Efficient batch access for distributed training - Integration with existing ML infrastructure ### Feature Engineering and Storage - Pre-computation of expensive features - Storage of both raw data and derived features - Flexible schema to accommodate different feature types - Efficient retrieval of specific columns and rows ### Training Integration - Support for distributed training workflows - Efficient batch-wise data access - Parallel data loading across multiple machines - Caching mechanisms to improve performance - Integration with existing training pipelines ## Research-Engineering Collaboration ### Shared Codebase Approach - Single codebase for both training and inference - Benefits: - Challenges: ### Infrastructure Tools - Docker-based deployment - Simple CLI tools for cloud training - Python-based configuration (vs. YAML) - Caching mechanisms for faster iteration - Shared libraries and utilities ### Team Structure - ML Acceleration team bridges researchers and backend engineers - Focus on: ## Best Practices and Lessons Learned ### Data Management - Prioritize searchability and accessibility - Enable semantic queries over large datasets - Support for data versioning and lineage - Efficient storage and retrieval mechanisms ### Development Workflow - Avoid script-based handoffs between teams - Use type hints and validation for configurations - Maintain shared code quality standards - Enable fast iteration cycles ### Infrastructure Design - Keep development experience close to local workflow - Abstract away complexity of distributed systems - Focus on caching and performance optimization - Build reusable components and libraries ## Results and Impact - Improved researcher productivity through better tools - More efficient use of computational resources - Better collaboration between research and engineering - Faster iteration cycles for model development - More maintainable production systems ## Future Directions - Continued improvement of caching mechanisms - Enhanced support for new modalities - Better tools for data quality assessment - Improved mechanisms for experiment tracking - Further optimization of GPU utilization The case study demonstrates how thoughtful infrastructure design and team organization can support complex AI development workflows. By building systems that accommodate both researcher flexibility and production requirements, Runway has created an environment that enables rapid innovation while maintaining operational efficiency.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free