Asterrave: Building and Deploying Production AI Agents for Enterprise Data Analysis

LLMOps Database

Tech

Asterrave

Company

Asterrave

Title

Building and Deploying Production AI Agents for Enterprise Data Analysis

Industry

Tech

Link

https://www.youtube.com/watch?v=7MiFIhlkBoE

Year

2024

Summary (short)

Rosco's CTO shares their two-year journey of rebuilding their product around AI agents for enterprise data analysis. They focused on enabling agents to reason rather than rely on static knowledge, developing discrete tool calls for data warehouse queries, and creating effective agent-computer interfaces. The team discovered key insights about model selection, response formatting, and multi-agent architectures while avoiding fine-tuning and third-party frameworks. Their solution successfully enabled AI agents to query enterprise data warehouses with proper security credentials and user permissions.

Tags

This case study details the journey and lessons learned by Rosco, a company that completely rebuilt their product around AI agents for enterprise data analysis. The speaker, Patrick (former CTO), provides valuable insights into the practical challenges and solutions of deploying AI agents in production environments. At its core, the case study focuses on building AI agents that could effectively query enterprise data warehouses. The team developed a specific definition for what constitutes an AI agent, requiring three key elements: * The ability to take directions (from humans or other AIs) * Access to call at least one tool and receive responses * Autonomous reasoning capability for tool usage One of the most significant technical insights was their approach to agent design. Rather than following the common pattern of using RAG (Retrieval Augmented Generation) with content inserted into system prompts, they focused on enabling agents to think and reason through problems using discrete tool calls. This approach proved particularly valuable when dealing with SQL query generation. The team discovered that overwhelming the agent with too much schema information in the prompt led to poor performance. Instead, they broke down the functionality into smaller, more focused tool calls such as: * Search tables * Get table detail * Profile columns This modular approach allowed the agent to iteratively build understanding and generate more accurate queries. A particularly interesting technical comparison was made between GPT-4 and Claude. The team found that response formatting had a crucial impact on agent performance: * GPT-4 performed better with JSON-formatted responses * Claude showed better results with XML-formatted responses * Initial markdown formatting proved problematic, especially with large result sets (30,000+ tokens) The case study provides valuable insights into production deployment considerations. They intentionally avoided using third-party frameworks like LangGraph or Crew AI, despite their popularity. This decision was driven by specific production requirements, particularly around security and authentication. They needed to cascade end-user security credentials down to the agent level, allowing it to query Snowflake with appropriate user-specific permissions through OAuth integration. The team's experience with model selection and usage was particularly instructive. They found that: * Fine-tuning models actually decreased reasoning capabilities * Claude 3.5 provided an optimal balance of speed, cost, and decision-making quality * The main reasoning model needed to be highly capable, while subsidiary tasks could use cheaper models A significant portion of their learning came from implementing multi-agent systems. Their key architectural decisions included: * Implementing a manager agent within a hierarchy * Limiting multi-agent teams to 5-8 agents (similar to Amazon's "two-pizza rule") * Focus on incentivization rather than strict process control * Careful delegation of subtasks to specialized worker agents The team's approach to production deployment emphasized pragmatic solutions over theoretical elegance. They found that the real value wasn't in the system prompts (which many teams treated as proprietary IP) but in: * The ecosystem around the agent * User experience design * Security and authentication implementation * Integration with enterprise systems Security implementation was a crucial aspect of their production deployment. They developed systems to: * Handle OAuth integration for enterprise data access * Manage user-specific permissions at the data warehouse level * Ensure secure credential management and proper access controls The case study also reveals interesting insights about model behavior in production. For instance, they observed that model hallucinations often indicated preferred input formats - when an agent consistently ignored the specified JSON schema for tool calls, it was often suggesting a more natural format aligned with its training data. A crucial learning was about the importance of what they termed the "Agent Computer Interface" (ACI). Small changes in tool call syntax and response formatting had outsized impacts on agent performance. This led to continuous iteration and refinement of: * Tool call formats * Response structures * Error handling patterns * Context management The team's experience highlighted the importance of focusing on reasoning capabilities over knowledge embedding. This approach proved more robust and maintainable in production, allowing agents to handle novel situations and edge cases more effectively. This case study represents a valuable contribution to the field of practical LLMOps, especially in enterprise settings. It demonstrates how theoretical concepts around AI agents need to be adapted and refined for production use, with particular attention to security, scalability, and real-world performance considerations.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free