LinkedIn developed a comprehensive LLM-based system for extracting and mapping skills from various content sources across their platform to power their Skills Graph. The system uses a multi-step AI pipeline including BERT-based models for semantic understanding, with knowledge distillation techniques for production deployment. They successfully implemented this at scale with strict latency requirements, achieving significant improvements in job recommendations and skills matching while maintaining performance with 80% model size reduction.
LinkedIn has built a sophisticated AI-powered skill extraction and mapping system to fuel their Skills Graph, which serves as foundational technology for member-job matching, learning recommendations, and skills-first hiring initiatives across the platform. The case study provides a comprehensive look at how large-scale language models and NLP systems are operationalized in production at LinkedIn, addressing challenges around latency, scale, model complexity, and continuous improvement.
The core problem LinkedIn faced was that skills are mentioned throughout diverse content types—member profiles, job postings, resumes, LinkedIn Learning courses, and feed posts—but not always in structured, easily extractable formats. Skills may be listed explicitly in dedicated sections, embedded in free-text descriptions, or only implied through context. LinkedIn needed a robust system to identify, extract, normalize, and map these skills to their canonical Skills Graph (containing over 41,000 skills) while operating at massive scale with strict latency requirements.
The skill extraction pipeline consists of several interconnected stages, each addressing a specific aspect of the extraction and mapping challenge.
Before any skill extraction occurs, the system parses raw input content into well-formed structures. For job postings, this means identifying sections like “company description,” “responsibilities,” “benefits,” and “qualifications.” For resumes, it identifies skills sections and past experiences. This segmentation is crucial because the location of a skill mention provides important signal about its relevance—a skill mentioned in qualifications is typically more important than one mentioned in company descriptions.
LinkedIn employs a dual approach to skill tagging that balances speed and semantic understanding:
Trie-Based Token Matching: This approach encodes skill names from the taxonomy into a trie structure and performs token-based lookups on raw text input. The advantage is exceptional speed and scalability for high-volume text processing. The limitation is dependency on the skills taxonomy to capture every variation of how skills are expressed in natural language.
Semantic Two-Tower Model: To complement the token-based approach, LinkedIn developed a semantic tagger using a two-tower architecture based on Multilingual BERT as the text encoder. This model builds contextual embeddings for both source text and skill names, with the two-tower structure decoupling the generation of sentence and skill embeddings while keeping them comparable via a similarity function. This enables the system to infer skills from contextual descriptions like “experience with design of iOS application” mapping to “Mobile Development” even when the skill isn’t explicitly mentioned.
Once initial skills are tagged, the system leverages the Skills Graph’s structure to expand the skill set. This includes querying for related skills in the same skill group and skills with structural relationships such as parent skills, children skills, and sibling skills. This expansion increases the chances of relevant skill matches.
The final scoring stage uses a multitask model architecture with shared and domain-specific components:
Shared Module: Contains a Contextual Text Encoder (using Transformer architecture) that incorporates text information from skill mentions, surrounding context, job titles, and member profiles. A Contextual Entity Encoder utilizes pre-calculated embeddings for skills, titles, industries, and geographic entities, plus manual features like co-occurrence rates between entities.
Domain-Specific Module: Multiple dedicated model towers for each vertical (job postings, member profiles, feeds, etc.) that share the contextual information from the shared module but are developed independently. This architecture allows each vertical to maintain flexibility for their specific nuances while benefiting from shared representations.
The case study reveals significant LLMOps challenges around serving BERT-based models at scale. LinkedIn processes approximately 200 global profile edits per second, with each message needing to be processed in under 100 milliseconds. Serving a full 12-layer BERT model while maintaining these latency standards is described as “a daunting task even for industry leaders” due to BERT’s large parameter count and computational demands.
LinkedIn’s key innovation for production serving was applying Knowledge Distillation to transfer knowledge from the larger “teacher” BERT network to a smaller “student” network. This approach reduced model size by 80% without compromising performance, enabling deployment within the existing Samza-BEAM CPU serving constraints.
The team developed a balance between performance and model complexity that acknowledges the research finding that large models often underutilize their capacity, making compression possible without significant performance degradation.
For the full data reprocessing challenge, LinkedIn developed a hybrid solution combining offline and nearline processing:
This hybrid approach optimized cost-to-serve while meeting the stringent requirements for both online and offline/nearline scenarios within inference time SLAs.
A critical aspect of the LLMOps implementation is the integration of product-driven feedback loops for model improvement:
Recruiter Skill Feedback: When recruiters manually post jobs, they’re shown AI-generated skill suggestions that they can edit. This provides direct feedback on skill extraction quality from the hiring perspective.
Job Seeker Skill Feedback: Job seekers see how many of their skills overlap with job requirements and can flag irrelevant skills. This captures the candidate’s perspective on skill-job relevance for model training.
Skill Assessments: LinkedIn Skill Assessments allow members to validate their skills through adaptive quizzes. Members scoring at 70th percentile or higher earn verified skill badges. This provides ground-truth validation of skills that can inform model improvement.
The multitask learning approach for identifying skill relationships (required, core, mentioned/valid) produced measurable improvements across LinkedIn products:
While these percentage improvements may appear modest, at LinkedIn’s scale they represent significant business impact.
LinkedIn outlines several forward-looking investments in their skill understanding capabilities:
This case study demonstrates a mature, production-grade approach to deploying NLP/LLM-based systems at scale. The hybrid architecture combining efficient token-based matching with semantic understanding shows practical engineering trade-offs. The 80% model compression via Knowledge Distillation addresses a common challenge in deploying large language models in latency-sensitive applications.
The feedback loop integration is particularly noteworthy as it demonstrates how product features can be designed to simultaneously deliver member value and generate training signal for model improvement. The multitask learning architecture also shows thoughtful consideration of how to share representations across domains while preserving domain-specific flexibility.
One potential limitation is that the case study comes from LinkedIn’s engineering blog, so results are presented favorably. The reported A/B test improvements, while statistically meaningful at scale, are relatively small percentages, suggesting the system’s value lies in aggregate improvement across many interactions rather than dramatic transformation of individual experiences.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
DoorDash faced challenges in scaling personalization and maintaining product catalogs as they expanded beyond restaurants into new verticals like grocery, retail, and convenience stores, dealing with millions of SKUs and cold-start scenarios for new customers and products. They implemented a layered approach combining traditional machine learning with fine-tuned LLMs, RAG systems, and LLM agents to automate product knowledge graph construction, enable contextual personalization, and provide recommendations even without historical user interaction data. The solution resulted in faster, more cost-effective catalog processing, improved personalization for cold-start scenarios, and the foundation for future agentic shopping experiences that can adapt to real-time contexts like emergency situations.
LinkedIn developed Hiring Assistant, an AI agent designed to transform the recruiting workflow by automating repetitive tasks like candidate sourcing, evaluation, and engagement across 1.2+ billion profiles. The system addresses the challenge of recruiters spending excessive time on pattern-recognition tasks rather than high-value decision-making and relationship building. Using a plan-and-execute agent architecture with specialized sub-agents for intake, sourcing, evaluation, outreach, screening, and learning, Hiring Assistant combines real-time conversational interfaces with large-scale asynchronous execution. The solution leverages LinkedIn's Economic Graph for talent insights, custom fine-tuned LLMs for candidate evaluation, and cognitive memory systems that learn from recruiter behavior over time. The result is a globally available agentic product that enables recruiters to work with greater speed, scale, and intelligence while maintaining human-in-the-loop control for critical decisions.