Hex: Production AI Agents with Dynamic Planning and Reactive Evaluation

LLMOps Database

Tech

Hex

Company

Hex

Title

Production AI Agents with Dynamic Planning and Reactive Evaluation

Industry

Tech

Link

https://www.youtube.com/watch?v=-kdl04xqasY

Year

2023

Summary (short)

Hex successfully implemented AI agents in production for data science notebooks by developing a unique approach to agent orchestration. They solved key challenges around planning, tool usage, and latency by constraining agent capabilities, building a reactive DAG structure, and optimizing context windows. Their success came from iteratively developing individual capabilities before combining them into agents, keeping humans in the loop, and maintaining tight feedback cycles with users.

Tags

This case study examines how Hex, a data science notebook platform company, successfully implemented AI agents in production when many other companies struggled to do so. The insights come from an interview with Brian Bishoff, who leads AI at Hex, discussing their journey to build and deploy reliable AI agents for data analysis workflows. ## Core Problem and Approach Hex tackled the challenge of implementing AI agents to help data scientists with their analysis workflows. Rather than trying to build a fully autonomous system, they focused on creating agents that work collaboratively with users while handling multiple steps of analysis simultaneously. The key innovation was in how they structured and constrained their agent system to make it reliable and practical in production. Their approach evolved through several key realizations: * Start with individual capabilities before combining them into agents * Constrain the types of plans agents can generate * Keep the human user closely involved in the workflow * Build a reactive system that can adapt to changes ## Technical Implementation Details ### Agent Architecture and Planning The system uses a planning paradigm where one agent creates a plan consisting of individual steps that can be executed by other agents either in parallel or sequence. Initially, they allowed high diversity in plans but found this led to reliability issues. They solved this by being more prescriptive about the types of steps that could be executed, while still allowing flexibility in how plans are generated. A key architectural component is their DAG (Directed Acyclic Graph) structure for managing agent workflows. This allows them to: * Track dependencies between agent steps * Infer what state information needs to be passed between agents * Support reactive updates when changes are made * Maintain consistency across the entire workflow ### Tool Usage and API Design Hex carefully designed their tool interfaces to balance power and reliability. Rather than exposing very general capabilities, they mapped their tools closely to existing user workflows in their platform. This provided natural constraints that helped keep agents on track while still enabling powerful functionality. ### Performance Optimization Despite using GPT-4 Turbo as their base model, Hex achieved impressive performance through several optimization strategies: * Aggressive context window optimization * Removing unnecessary information from prompts * Starting with maximum context and iteratively removing elements until finding minimal working solutions * Careful prompt engineering to maintain reliability with minimal context ### Evaluation and Quality Control Hex developed a sophisticated approach to evaluation that includes: * Multiple specialized evaluators each testing specific aspects * Execution evaluation for code and SQL generation * Environment simulation to verify results * Integration of evaluation into the feature development process ## Key Learnings and Best Practices ### Human-in-the-Loop Design Rather than pursuing fully autonomous agents, Hex found success by keeping humans closely involved. Their interactive paradigm allows users to observe and correct agent behavior, preventing error accumulation while maintaining high utility. ### Iterative Development They built capabilities incrementally, first as standalone features before incorporating them into the agent system. This allowed them to learn from user behavior and understand the nuances of each capability before combining them. ### Data-Driven Improvement A critical insight was the importance of constantly analyzing system data. The team maintains regular "evals parties" to review system performance and user interactions, using these insights to drive improvements. ### Practical Constraints Rather than trying to solve general AI problems, Hex found success by embracing practical constraints: * Limiting the length of agent chains * Focusing on specific, well-defined capabilities * Maintaining clear boundaries for agent actions * Using existing workflow patterns as guides for tool design ## Results and Impact The system has successfully gone into production, providing data scientists with AI assistance that maintains high reliability while offering meaningful automation. The approach has proven particularly effective for data analysis workflows where multiple steps need to be coordinated while maintaining user oversight. ## Future Directions Hex continues to evolve their agent system, with ongoing work in: * Expanding the range of supported analysis patterns * Improving evaluation techniques * Optimizing performance further * Enhancing the reactive capabilities of their system This case study demonstrates that successful deployment of AI agents in production requires careful attention to system design, strong evaluation practices, and a pragmatic approach that prioritizes reliability and user interaction over complete autonomy.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source