OLX faced a challenge with unstructured job roles in their job listings platform, making it difficult for users to find relevant positions. They implemented a production solution using Prosus AI Assistant, a GenAI/LLM model, to automatically extract and standardize job roles from job listings. The system processes around 2,000 daily job updates, making approximately 4,000 API calls per day. Initial A/B testing showed positive uplift in most metrics, particularly in scenarios with fewer than 50 search results, though the high operational cost of ~15K per month has led them to consider transitioning to self-hosted models.
OLX, a global online marketplace, undertook a project to improve their job listings search experience by extracting structured job roles from unstructured job advertisement data. The core problem was that job roles were not clearly defined within their jobs taxonomies—instead, they were buried within ad titles and descriptions, making it difficult for job seekers to find relevant positions. This case study documents their journey from proof of concept through to production deployment, highlighting both the successes and the pragmatic cost considerations that come with using external LLM APIs at scale.
The solution leverages Prosus AI Assistant, an LLM service developed by Prosus (OLX’s parent company, a global consumer internet group), which operates on top of OpenAI’s infrastructure through a special agreement that includes enhanced privacy measures and a zero-day data retention policy. This case study is particularly instructive for teams considering the build-versus-buy decision for LLM capabilities in production systems.
The job-role extraction system operates through a multi-stage pipeline that processes job advertisements to create structured, searchable job role data. The architecture integrates with OLX’s existing infrastructure, particularly their search indexing system.
Before sending data to the LLM, the team implemented several preprocessing steps. They sampled 2,000 job ads for their proof of concept, accounting for uneven distribution across sub-categories to ensure representative coverage. The preprocessing pipeline includes text cleaning, trimming content to the first 200 words/tokens (to manage API costs and stay within token limits), and translation where necessary since the initial focus was on the Polish market.
A parallel analysis examined the most-searched keywords in the Jobs categories. Using the LLM, they categorized keywords into professions, job types, locations, and broader descriptors. This analysis revealed that approximately 60% of searched keywords relate to specific professions, validating the focus on job role extraction as a high-impact improvement area.
The team used a structured approach to generate normalized job-role taxonomies. This involved providing the LLM with up to 100 profession-related searched keywords and up to 50 job roles extracted from randomly selected job ads within each category. A carefully crafted prompt guided the model to produce hierarchical taxonomies considering both responsibilities and department structures. The prompt structure explicitly requested categorization with detailed instructions and specified output format requirements.
The production implementation consists of two main operational modes:
A dedicated service subscribes to ad events and uses Prosus AI Assistant to extract job taxonomy information. The extracted job roles are then sent to AWS Kinesis, which feeds into the search team’s indexing pipeline. The enriched data connects extracted job roles with other ad information like titles and parameters for search lookup.
The team developed specific prompt engineering guidelines through their experimentation:
The team also utilized the LangChain framework to streamline interactions with the LLM API, simplify outcome specifications, and chain tasks for enhanced efficiency.
In production, the system handles approximately 2,000 newly created or updated ads daily. The team made an architectural decision to break down the processing into two sub-tasks—job-role extraction and matching within the standardized tree—resulting in approximately 4,000 daily API requests to Prosus AI Assistant.
For taxonomy generation, the API request volume depends on the number of sub-categories and is only triggered when there are changes or updates to the category tree, which occurs at most a few times per month. This distinction between continuous extraction operations and periodic taxonomy regeneration is an important architectural consideration for managing costs and system complexity.
The team conducted A/B testing to evaluate the impact of the job-role extraction system, focusing on the retrieval stage of search (not yet integrated into search ranking). They acknowledged that significant results require time and designed their experiment with strategic segmentation, dividing results into low, medium, and high segments.
Key observations from the experiments include:
The team was transparent about limitations—the impact currently resides only in the retrieval stage and is not yet integrated into search ranking, so improvements may not appear prominently in top results.
The decision to use Prosus AI Assistant over self-hosted LLMs was driven by several factors:
The team acknowledged potential risks including slightly longer response times, dependency on external API availability, and questions about long-term viability. They positioned this as a strategic choice for rapid deployment while remaining open to exploring custom LLMs for future optimization.
The case study provides valuable transparency about operational costs: approximately $15,000 per month for the Prosus AI Assistant service. This cost revelation prompted serious reflection on sustainability and efficiency for ongoing operations.
The team is now evaluating a pivot toward self-hosted models, which could offer:
This honest assessment of the economics of LLM operations is particularly valuable for teams planning production deployments. While external services can expedite exploration and proof-of-concept phases, long-term cost considerations often guide strategic decisions toward self-hosted alternatives.
A notable operational challenge is managing category evolution. As OLX’s teams continuously improve job categories, changes can necessitate recreation of job-role taxonomies and potentially introduce inconsistencies between taxonomies created before and after sub-category changes.
The planned strategy involves implementing an automated process that detects changes in sub-categories and automatically regenerates necessary job-role taxonomies. This proactive approach ensures the extraction model remains aligned with the evolving job landscape without requiring manual intervention.
This case study illustrates several important LLMOps principles:
The OLX team’s transparency about both successes and challenges—including the significant monthly costs that are prompting reconsideration of their approach—provides realistic guidance for teams implementing similar LLM-powered extraction systems in production environments.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Fastweb / Vodafone, a major European telecommunications provider serving 9.5 million customers in Italy, transformed their customer service operations by building two AI agent systems to address the limitations of traditional customer support. They developed Super TOBi, a customer-facing agentic chatbot system, and Super Agent, an internal tool that empowers call center consultants with real-time diagnostics and guidance. Built on LangGraph and LangChain with Neo4j knowledge graphs and monitored through LangSmith, the solution achieved a 90% correctness rate, 82% resolution rate, 5.2/7 Customer Effort Score for Super TOBi, and over 86% One-Call Resolution rate for Super Agent, delivering faster response times and higher customer satisfaction while reducing agent workload.
Swisscom, Switzerland's leading telecommunications provider, developed a Network Assistant using Amazon Bedrock to address the challenge of network engineers spending over 10% of their time manually gathering and analyzing data from multiple sources. The solution implements a multi-agent RAG architecture with specialized agents for documentation management and calculations, combined with an ETL pipeline using AWS services. The system is projected to reduce routine data retrieval and analysis time by 10%, saving approximately 200 hours per engineer annually while maintaining strict data security and sovereignty requirements for the telecommunications sector.