ZenML

Building and Deploying AI-Powered Visual and Semantic Search in Design Tools

Figma 2024
View original source

Figma tackled the challenge of designers spending excessive time searching for existing designs by implementing AI-powered search capabilities. They developed both visual search (using screenshots or sketches) and semantic search features, using RAG and custom embedding systems. The team focused on solving real user workflows, developing systematic quality evaluations, and scaling the infrastructure to handle billions of embeddings while managing costs. The project evolved from an initial autocomplete prototype to a full-featured search system that helps designers find and reuse existing work more efficiently.

Industry

Tech

Technologies

Overview

Figma, the collaborative design platform, developed AI-powered search capabilities to address a common pain point: designers spending excessive time finding existing designs. The company observed that their own designers were losing time trying to track down source files when they only had a screenshot, with hundreds of messages appearing in Slack channels where designers asked teammates for help. This led to the development of two AI-powered search features launched at Config 2024: visual search (allowing searches using screenshots, selected frames, or quick sketches) and semantic search (understanding the context behind text-based queries even when users don’t know exact terms for component names or descriptions).

The case study provides valuable insights into how a product-focused tech company approaches building AI features that genuinely add value rather than being built purely for hype. The journey from initial hackathon prototype to production feature took over a year and involved significant pivots based on user research.

Initial Approach and Pivot

The project originated from a three-day AI hackathon in June 2023, which produced 20 completed projects including a working prototype for “design autocomplete” - an AI assistant that would suggest components as designers work (e.g., suggesting a “Get started” button for an onboarding flow). The team initially added this to the product roadmap and began building it.

However, as they shared the working prototype with internal teams and conducted user research sessions, consistent patterns emerged. The team discovered that designers don’t just start from scratch—they constantly riff on existing work, revisiting past explorations and using them to push their own work forward. In fact, 75% of all objects added to the Figma canvas come from other files. This insight led to a critical pivot: rather than focusing on autocomplete, the team recognized that improving search and navigation was the more pressing need.

This pivot demonstrates a mature approach to AI product development—being willing to adjust direction based on user feedback rather than pushing forward with a technically impressive feature that may not address core user needs.

Technical Architecture and RAG Foundation

The team explicitly mentions building on Retrieval Augmented Generation (RAG) principles. They understood that they could improve AI outputs by providing examples from search; if design autocomplete could find designs similar to what a designer was working on, it could better suggest the next component. This insight meant that even if they shipped search first, the underlying infrastructure would support future AI features like autocomplete.

The technical approach involved generating and indexing embeddings to power visual and semantic search. While the article references a separate companion piece on the infrastructure specifics, it provides insights into the indexing challenges unique to their domain.

Challenges of Indexing an Infinite Canvas

One of the most interesting LLMOps challenges described is the problem of determining what to index. Unlike traditional document search where the unit of indexing is clear, Figma’s infinite canvas presents unique challenges:

These indexing decisions significantly impact both the quality of search results and infrastructure costs—a classic LLMOps tradeoff.

Evaluation Strategy

The case study provides detailed insight into their evaluation methodology. The team built custom evaluation tools to measure search quality:

This approach to building domain-specific evaluation tools is a best practice in LLMOps. Rather than relying solely on generic metrics, the team created evaluation mechanisms that matched their specific product context and could be used by domain experts (designers) rather than just ML engineers.

Iterative Development and Internal Beta

The team followed an iterative development approach with several notable practices:

Design Considerations for AI Features

An interesting aspect of this case study is the attention to UX design for AI-powered features:

Key Principles and Takeaways

The team articulated four guiding principles for shipping AI features:

Honest Assessment

While this case study provides valuable insights into Figma’s approach, it’s worth noting some limitations in the information provided:

That said, the case study offers genuine value for LLMOps practitioners, particularly in its discussion of domain-specific indexing challenges, custom evaluation tooling, and the product thinking that guided technical decisions. The willingness to pivot from autocomplete to search based on user research is a particularly noteworthy example of user-centered AI development.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Building Production-Ready SQL and Charting Agents with RAG Integration

Numbers Station

Numbers Station addresses the challenge of overwhelming data team requests in enterprises by developing an AI-powered self-service analytics platform. Their solution combines LLM agents with RAG and a comprehensive knowledge layer to enable accurate SQL query generation, chart creation, and multi-agent workflows. The platform demonstrated significant improvements in real-world benchmarks compared to vanilla LLM approaches, reducing setup time from weeks to hours while maintaining high accuracy through contextual knowledge integration.

data_analysis data_cleaning data_integration +15

Building a Search Engine for AI Agents: Infrastructure, Product Development, and Production Deployment

Exa.ai 2025

Exa.ai has built the first search engine specifically designed for AI agents rather than human users, addressing the fundamental problem that existing search engines like Google are optimized for consumer clicks and keyword-based queries rather than semantic understanding and agent workflows. The company trained its own models, built its own index, and invested heavily in compute infrastructure (including purchasing their own GPU cluster) to enable meaning-based search that returns raw, primary data sources rather than listicles or summaries. Their solution includes both an API for developers building AI applications and an agentic search tool called Websites that can find and enrich complex, multi-criteria queries. The results include serving hundreds of millions of queries across use cases like sales intelligence, recruiting, market research, and research paper discovery, with 95% inbound growth and expanding from 7 to 28+ employees within a year.

question_answering data_analysis chatbot +44