Mendable.ai enhanced their enterprise AI assistant platform with Tools & Actions capabilities, enabling automated tasks and API interactions. They faced challenges with debugging and observability of agent behaviors in production. By implementing LangSmith, they successfully debugged agent decision processes, optimized prompts, improved tool schema generation, and built evaluation datasets, resulting in a more reliable and efficient system that has already achieved $1.3 million in savings for a major tech company client.
# Mendable's Journey with LangSmith for Production LLM Debugging
## Company and Use Case Overview
Mendable.ai is a platform that specializes in helping enterprise teams answer technical questions using AI. They implemented an AI assistant system for a 20+ billion tech company, supporting around 1000 customer success and sales personnel. The implementation has been highly successful, achieving 1.3 million in savings within five months and projecting $3 million in savings for the coming year.
## Tools & Actions Implementation
- Expanded their AI assistant capabilities beyond simple Q&A to include action-based interactions
- Integrated multiple API calls and data sources:
- Implemented dynamic API request handling with AI-generated values
- Designed user-friendly tool creation capabilities via API calls
## Technical Challenges
- Faced significant observability issues with agent behavior in production
- Struggled with reliability in the agent decision-making process
- Initial debugging attempts relied heavily on console.logs
- Experienced high latency in production runs
- Difficulty in tracking tool execution and API request failures
- Complex prompting system leading to performance issues
## LangSmith Integration and Solutions
### Debugging Capabilities
- Implemented detailed visualization of run call hierarchies
- Enabled comprehensive tracing of agent interactions
- Gained visibility into:
### Performance Optimization
- Identified prompt concatenation issues causing latency
- Located specific timing problems in ChatOpenAI calls
- Optimized prompt chunks for Tools & Actions module
- Reduced overall system latency through targeted improvements
### Tool Inspection and Validation
- Implemented systematic tool input inspection
- Validated schema accuracy for custom tools
- Conducted approximately 20 test runs per tool
- Enhanced tool description requirements based on testing results
- Improved AI-generated content accuracy in the product
### Data Management and Evaluation
- Created comprehensive datasets from production runs
- Established evaluation frameworks using LangSmith
- Implemented quick saving of inputs/outputs for analysis
- Built a foundation for continuous improvement
## Key Technical Improvements
- Better visibility into agent decision processes
- Optimized prompt management
- Enhanced schema validation
- Improved tool description generation
- Streamlined debugging workflows
- Reduced system latency
- Better data collection for evaluation
## Production Impact
- Successful deployment of Tools & Actions feature
- Improved reliability of AI-generated tool inputs
- Enhanced system performance
- Better user experience in tool creation
- More accurate AI responses
- Faster debugging cycles
- More efficient development process
## Lessons Learned
- Importance of detailed tool descriptions for accurate schema generation
- Critical role of observability in production LLM systems
- Value of systematic testing and evaluation
- Need for structured data collection in LLM applications
- Benefits of integrated debugging tools in the development process
## Architecture and Integration Details
- Seamless integration with existing LangChain components
- Effective use of OpenAI tool agents
- Structured approach to API integrations
- Robust tracing system implementation
- Comprehensive monitoring setup
## Future Considerations
- Continuing enterprise customization opportunities
- Ongoing optimization of Tools & Actions
- Further refinement of debugging processes
- Expansion of evaluation frameworks
- Enhanced data collection for system improvement
This case study demonstrates the critical importance of proper debugging and observability tools in production LLM systems. The integration of LangSmith provided Mendable with the necessary visibility and control to successfully deploy and maintain their advanced AI capabilities, while ensuring reliable performance and continuous improvement opportunities.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.