Aomni: Evolving Agent Architecture Through Model Capability Improvements

LLMOps Database

Tech

Aomni

Company

Aomni

Title

Evolving Agent Architecture Through Model Capability Improvements

Industry

Tech

Link

https://www.youtube.com/watch?v=-aBD8i2_mGM&list=PLeVLmyUq90f_ilFVfqnONalzdzUfAf-xr&index=7

Year

2023

Summary (short)

David from Aomni discusses how their company evolved from building complex agent architectures with multiple guardrails to simpler, more model-centric approaches as LLM capabilities improved. The company provides AI agents for revenue teams, helping automate research and sales workflows while keeping humans in the loop for customer relationships. Their journey demonstrates how LLMOps practices need to continuously adapt as model capabilities expand, leading to removal of scaffolding and simplified architectures.

Tags

This case study provides a fascinating look into how Aomni has evolved their LLMOps practices alongside improving language model capabilities, particularly in the context of building AI agents for enterprise sales teams. The company's journey begins with their initial viral success in mid-2023, when they created a research agent that stood out for its reliability and production-ready implementation. Their key technical insight was treating agents as workflow orchestration systems, applying established microservice patterns to handle reliability issues. This meant connecting to microservice workflow orchestration systems and implementing proper error handling, retries, and guardrails. A central philosophy emerged in their LLMOps practice: "don't bet against the model." This manifests in their approach to continuous improvement, where they completely rewrite their product with each significant improvement in model capabilities (approximately every time model capability doubles). This allows them to progressively remove scaffolding and guardrails as models become more capable. The evolution of their research agent architecture provides a concrete example of this philosophy in action: * Initial Version (2023): * 20-30 different prompts and LLM calls * Complex "agent swarm" architecture with multiple personas * Heavy use of reflection where one model would critique another's output * Extensive guardrails and validation checks * Worked well but was heavily constrained * Current Version: * Only two LLM calls running in a loop * Approximately 200 lines of core logic * Allows both recursive deep dives and parallel exploration * Controls limited to depth and breadth of research * More flexible and capable despite simpler architecture Their LLMOps practices include sophisticated evaluation approaches, though they acknowledge the challenges in this area. While they maintain evaluation datasets and automated testing scripts, they emphasize that pure metrics-based evaluation isn't sufficient. They've found that sometimes newer, more capable models might fail existing evaluation criteria while actually producing better results through novel approaches. This has led them to adopt a hybrid approach combining automated evaluation with human review and regular updates to evaluation criteria. For production deployment, Aomni focuses heavily on context management and tool integration. They've experimented with different approaches to service discovery and tool management for their agents, including: * Direct tool integration with all tools in context * Dynamic loading of tools through service discovery * Experimentation with different tool calling implementations across models The company has made interesting architectural choices around human-AI interaction, particularly in the enterprise sales context. Rather than trying to replace salespeople with AI (the AI SDR approach), they focus on augmenting human sales representatives by automating the "back office" tasks that typically consume 70% of a sales representative's time. This requires careful consideration of: * When to involve human input * How to maintain context across interactions * Proper handling of enterprise data and security requirements * Integration with existing sales tools and workflows One particularly interesting aspect of their LLMOps practice is how they handle context and memory. They've implemented multiple approaches: * Explicit context gathering through user onboarding questions * Ambient context collection from email and conversation imports * Custom knowledge graph construction * Integration with various memory protocols and systems Their experience with testing and monitoring production agents has led to some key insights: * The importance of having both automated evaluation pipelines and human review * The need to regularly update evaluation criteria as model capabilities improve * The challenge of handling cases where models find better solutions that don't match predetermined patterns * The importance of robust error handling and recovery mechanisms Throughout their journey, Aomni has maintained a focus on reliability and production-readiness, even as they've simplified their architecture. They've found that as model capabilities improve, many of the complex patterns initially needed for reliability become unnecessary, allowing for simpler but more powerful implementations. This has led to a philosophy of building "scaffolding" that can be progressively removed as models improve, rather than building permanent complexity into their systems. Their experience provides valuable insights for others implementing LLMs in production, particularly around the importance of staying flexible and being willing to fundamentally rethink implementations as model capabilities evolve. The case study also highlights the ongoing tension between maintaining reliability and leveraging new model capabilities, and how this balance shifts over time.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.

Learn more

Try Free