Figma implemented AI-powered search features to help users find designs and components across their organization using text descriptions or visual references. The solution leverages the CLIP multimodal embedding model, with infrastructure built to handle billions of embeddings while keeping costs down. The system combines traditional lexical search with vector similarity search, using AWS services including SageMaker, OpenSearch, and DynamoDB to process and index designs at scale. Key optimizations included vector quantization, software rendering, and cluster autoscaling to manage computational and storage costs.
Figma's implementation of AI-powered search capabilities represents a significant case study in deploying LLMs and embedding models in production at scale. The company faced the challenge of helping users locate specific designs and components across large organizations with complex design systems, leading them to develop an AI-powered search solution that works with both visual and textual inputs.
The core of their implementation centers around the deployment of the CLIP multimodal embedding model, which can process both images and text to generate embeddings in a shared vector space. This choice is particularly interesting as it allows for cross-modal search - users can find designs using either text descriptions or visual references (screenshots or selections). The decision to use CLIP, an open-source model, rather than developing a custom model is noteworthy, though they did perform fine-tuning specifically for UI components using public Community files to maintain data privacy.
The technical infrastructure is built on several AWS services, showcasing a practical approach to scaling LLM operations:
* SageMaker for model deployment and inference
* DynamoDB for metadata and embedding storage
* OpenSearch for vector search functionality
* S3 for thumbnail storage
The pipeline for populating their search index demonstrates several important LLMOps considerations:
* They split the pipeline into discrete jobs for better control over batching and retry behavior:
* Identification of searchable designs
* Thumbnail generation
* Embedding generation
* Search index population
The scale of the operation presented significant challenges, requiring the team to process billions of entries. Their cost optimization strategies offer valuable insights for similar large-scale LLM deployments:
* Moving from Ruby to C++ for file processing
* Switching to CPU-based rendering with llvmpipe instead of GPU-based rendering
* Implementing debounced indexing (4-hour intervals) to reduce processing volume by 88%
* Using cluster autoscaling based on usage patterns
* Applying vector quantization to reduce embedding storage requirements
* Careful filtering of indexable content to reduce index size
The system architecture shows thoughtful consideration of production requirements:
* Parallel processing for both image downloading and preprocessing
* Careful batch size optimization for inference
* Integration of traditional lexical search with vector search
* Score normalization and result interleaving for hybrid search results
* Handling of replica consistency issues in OpenSearch
Their approach to gradual rollout and testing deserves attention. The team discovered that even testing with a small percentage of users required indexing nearly all teams' data due to the interconnected nature of their user base. This highlights the importance of considering deployment strategies and costs even during the testing phase of LLM-powered features.
The case study also reveals interesting technical challenges they encountered with OpenSearch in production:
* Non-deterministic results between primary and replica nodes
* Complications with document updates when optimizing _source field storage
* The need to maintain separate stores for embeddings and search indexes
Their solution for component search demonstrates a practical hybrid approach, combining traditional lexical search with embedding-based semantic search. This provides both exact matches and semantically relevant results, with a scoring system that appropriately weights different match types.
The infrastructure includes several reliability and performance optimizations:
* Separation of concerns between metadata storage (DynamoDB) and search functionality (OpenSearch)
* Careful batch size optimization for model inference
* Efficient handling of updates and changes to indexed content
* Implementation of debouncing strategies to manage update frequency
Future considerations mentioned in the case study include:
* Continued rollout to more users
* Gathering user feedback for feature refinement
* Potential improvements to search accuracy and performance
This case study provides valuable insights into the real-world challenges and solutions involved in deploying LLM-powered features at scale, particularly in the context of visual and multimodal search applications. It demonstrates the importance of carefully considering infrastructure choices, cost optimization strategies, and gradual rollout approaches when deploying LLMs in production environments.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.