Company
Grab
Title
Building a Multi-Provider GenAI Gateway for Enterprise-Scale LLM Access
Industry
Tech
Year
2025
Summary (short)
Grab developed an AI Gateway to provide centralized, secure access to multiple GenAI providers (including OpenAI, Azure, AWS Bedrock, and Google VertexAI) for their internal developers. The gateway handles authentication, cost management, auditing, and rate limiting while providing a unified API interface. Since its launch in 2023, it has enabled over 300 unique use cases across the organization, from real-time audio analysis to content moderation, while maintaining security and cost efficiency through centralized management.
This case study details Grab's implementation of an enterprise-scale AI Gateway that serves as a centralized access point for GenAI services across their organization. The system represents a comprehensive approach to managing LLMs in production, addressing key challenges in enterprise LLM deployment including security, cost management, and operational efficiency. The AI Gateway was developed to solve several critical challenges in enterprise LLM deployment: First, the gateway tackles the complexity of managing multiple AI providers by implementing a unified authentication system. Rather than having developers deal with various authentication methods (key-based, instance roles, cloud credentials), the gateway provides a centralized platform requiring only one-time provider setup. This significantly reduces the operational overhead for teams wanting to leverage LLMs in their applications. The architecture is built around a reverse proxy design pattern, which proves to be both flexible and maintainable. The gateway acts as an intermediary between users and various AI providers, handling authentication, authorization, and rate limiting. This design choice allows the platform to remain lightweight while supporting rapid integration of new providers and features. The gateway supports not just chat completion APIs but also embeddings, image generation, audio processing, and specialized features like fine-tuning and context caching. A particularly noteworthy aspect of the implementation is its focus on security and compliance. The platform implements a thorough review process for production use cases, requiring mini-RFCs and security checklists. This helps prevent common issues like prompt injection or accidental exposure of sensitive information through LLM applications. The gateway maintains detailed audit logs of all requests and responses, storing them in Grab's data lake for security analysis and compliance monitoring. The system implements sophisticated cost management features: * Shared capacity pooling to maximize utilization of reserved instances * Dynamic routing to optimize usage across regions and providers * Detailed cost attribution and monitoring at the service level * Automated alerts for cost threshold violations For developer experience, the gateway provides: * Exploration keys for prototyping and experimentation * A unified API interface following OpenAI's API schema * Automatic integration with Grab's ML Platform (Chimera notebooks and Catwalk deployment system) * Support for both synchronous and asynchronous operations The platform has faced several interesting technical challenges that provide valuable insights for similar implementations: * Maintaining compatibility with provider-specific SDKs while operating as a reverse proxy * Balancing fair quota distribution between batch and real-time applications * Keeping up with rapid innovations in the LLM space without overwhelming the platform * Managing rate limits effectively across different use cases with varying SLOs The gateway has demonstrated significant success, supporting over 300 unique use cases including: * Real-time audio signal analysis for ride safety * Content moderation systems * Menu item description generation * Internal productivity tools like text-to-SQL conversion * Incident management automation * Support chatbots using RAG Looking forward, the team is planning several important enhancements: * A comprehensive model catalogue to help users choose appropriate models * Built-in governance features including prompt injection protection * More sophisticated rate limiting based on token usage and costs * Enhanced monitoring and observability features The case study offers valuable insights into building and scaling an enterprise LLM platform. Their approach to balancing security, cost, and developer experience while maintaining operational efficiency provides a useful reference for organizations looking to implement similar systems. The challenges they faced and their solutions, particularly around rate limiting and cost management, highlight important considerations for enterprise-scale LLM deployments. A particular strength of their implementation is the focus on developer experience without compromising on security and governance. The exploration keys concept, combined with the unified API interface, allows for rapid prototyping while maintaining control over production deployments. The integration with existing ML infrastructure shows thoughtful consideration of the broader technical ecosystem. However, the case study also reveals some ongoing challenges, particularly around fair resource distribution and keeping pace with rapid innovations in the field. Their planned improvements suggest these are active areas of development that require continuous attention in enterprise LLM platforms.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.