Canva: LLM Feature Extraction for Content Categorization and Search Query Understanding

LLMOps Database

Tech

Canva

Company

Canva

Title

LLM Feature Extraction for Content Categorization and Search Query Understanding

Industry

Tech

Link

https://www.youtube.com/watch?v=jGWjP9gKx2o

Year

2023

Summary (short)

Canva implemented LLMs as a feature extraction method for two key use cases: search query categorization and content page categorization. By replacing traditional ML classifiers with LLM-based approaches, they achieved higher accuracy, reduced development time from weeks to days, and lowered operational costs from $100/month to under $5/month for query categorization. For content categorization, LLM embeddings outperformed traditional methods in terms of balance, completion, and coherence metrics while simplifying the feature extraction process.

Tags

# LLM Implementation for Feature Extraction at Canva ## Company Background Canva is an online design platform that aims to enable everyone to design everything. The platform handles various types of content, from design templates and media assets to informational articles, requiring sophisticated content organization and search capabilities. ## Problem Statement Canva faced two major challenges: - Need for efficient categorization of user search queries to understand user interests - Requirement to organize and group content pages based on semantic similarity and relevance ## Use Case 1: Search Query Categorization ### Traditional Approach vs LLM Solution - Traditional Approach: - LLM API Approach: ### Implementation Details - Used few-shot learning without fine-tuning - Implemented standardized completion format (JSON) - Added error mitigation: ### Fine-tuning Insights - Small training datasets (50-100 data points per class) proved sufficient - Fine-tuning consideration based on: ## Use Case 2: Content Page Categorization ### Challenge Overview - Diverse content types: - Need for semantic grouping and classification ### Traditional Methods - Required multiple feature extraction techniques: ### LLM Solution Performance - Metrics defined: - Results with LLM embeddings: ### Technical Implementation Details - Used OpenAI API with text-embedding-ada-002 model - Direct text embedding without preprocessing - Non-deterministic feature extraction consideration - Format and length independence in semantic understanding ## Production Considerations and Best Practices ### When to Use LLM APIs - Effective for straightforward text-based tasks - Ideal for rapid prototyping - Cost-effective at moderate scales - Suitable when custom logic can be handled via prompt engineering ### Operational Optimization - Regular scheduled jobs processing hundreds of thousands of inputs - API rate limit management - Fallback mechanism implementation - Cost monitoring and optimization ### Future Considerations - Evaluation of open-source LLMs for: - Potential for in-house API development ## Key Learnings ### Technical Benefits - Simplified development process - Reduced development effort - Shortened timeline - Lower operational costs for both training and inference - Better performance metrics ### Implementation Considerations - Text feature extraction can be non-deterministic - Need for standardized output formats - Importance of error handling and fallback solutions - Scale considerations for cost optimization ### Architecture Decisions - LLM as middle layer between upstream and downstream tasks - Integration with existing technologies - Balance between API usage and custom implementation - Consideration of open-source alternatives for scale ## Results and Impact - Significant reduction in development time - Improved accuracy in categorization tasks - Cost savings in operational expenses - More flexible and maintainable solution - Better semantic understanding of content - Simplified feature extraction pipeline

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free