Trellix implemented an AI-powered security threat investigation system using multiple foundation models on Amazon Bedrock to automate and enhance their security analysis workflow. By strategically combining Amazon Nova Micro with Anthropic's Claude Sonnet, they achieved 3x faster inference speeds and nearly 100x lower costs while maintaining investigation quality through a multi-pass approach with smaller models. The system uses RAG architecture with Amazon OpenSearch Service to process billions of security events and provide automated risk scoring.
Trellix, formed from the merger of McAfee Enterprise and FireEye in 2022, provides cybersecurity solutions to over 53,000 customers worldwide. This case study explores their implementation of Trellix Wise, an AI-powered security threat investigation system, with particular focus on their sophisticated approach to optimizing LLM operations in production.
# System Architecture and Initial Implementation
The core of Trellix Wise is built on Amazon Bedrock and initially relied primarily on Anthropic's Claude Sonnet model. The system architecture implements a RAG (Retrieval Augmented Generation) pattern using Amazon OpenSearch Service, which serves a dual purpose:
* Storage for billions of security events from monitored environments
* Vector database capabilities integrated with Amazon Bedrock Knowledge Bases
The initial workflow consists of multiple stages:
* Data collection from relevant security events
* Analysis of collected data using custom ML models
* Risk scoring and assessment
* Detailed investigation results for analyst review
# Cost and Performance Optimization Strategy
A key insight in Trellix's LLMOps journey was recognizing that not all stages of their investigation pipeline required the same level of model sophistication. This led to a sophisticated multi-model approach that optimizes both cost and performance while maintaining quality.
## Model Selection and Evaluation Process
The team developed a comprehensive testing harness to evaluate different models, recognizing that standard benchmarks weren't sufficient for their specific use case. Their evaluation focused on:
* Response completeness
* Cost per inference
* Processing speed
* Accuracy and hallucination rates
## Implementation of Multi-Model Strategy
After extensive testing, Trellix implemented a hybrid approach using both Amazon Nova Micro and Claude Sonnet. The key findings that drove this decision included:
* Amazon Nova Micro could process 3-5 inferences in the time of a single Claude Sonnet inference
* Cost per inference was approximately 100 times lower with Nova Micro
* While individual responses showed higher variability, multiple passes with Nova Micro could achieve comprehensive coverage
# Technical Implementation Details
## RAG Architecture Implementation
The system leverages Amazon OpenSearch Service's vector database capabilities for efficient context retrieval. This integration with Amazon Bedrock Knowledge Bases provides a robust foundation for the RAG architecture, allowing the system to incorporate relevant historical data and context into its analysis.
## Hallucination Control
Trellix developed proprietary prompt engineering techniques and reference data constraints to limit the response space of the models. This was particularly important when working with smaller models like Nova Micro, where response variability needed to be carefully managed.
## Multi-Pass Strategy
To compensate for the limitations of smaller models, Trellix implemented a multi-pass approach where multiple inferences are run in parallel. This strategy:
* Maximizes data coverage
* Reduces overall costs by a factor of 30
* Maintains high accuracy through response aggregation
* Achieves faster processing times despite multiple passes
# Production Deployment and Monitoring
The team adopted a careful, phased approach to production deployment:
* Initial deployment in a limited pilot environment
* Continuous monitoring and evaluation of model performance
* Gradual rollout to production workloads
* Regular assessment of cost and performance metrics
# Lessons Learned and Best Practices
Several key insights emerged from this implementation:
1. Model Flexibility
* Access to a range of models with different capabilities is crucial
* Smaller models can be effectively used for specific tasks within a larger workflow
* The ability to easily experiment with different models accelerates optimization
2. Response Control
* Pre-built use-case specific scaffolding incorporating proprietary data and processes is essential
* Careful constraint design helps maintain accuracy even with smaller models
* Multiple passes with smaller models can provide robust results
3. Integration and Infrastructure
* Effective data services integration is crucial for production success
* Vector database capabilities in existing infrastructure can be leveraged for RAG
* Flexible infrastructure allows for easy model switching and evaluation
The implementation demonstrates the importance of thoughtful model selection and architecture design in production LLM systems. Rather than defaulting to the most powerful (and expensive) models for all tasks, Trellix showed that a carefully designed multi-model approach can achieve better cost-performance characteristics while maintaining high quality results.
This case study highlights how production LLM systems can be optimized through careful evaluation of requirements at each stage of the pipeline, and how smaller, more efficient models can be effectively incorporated into production workflows when properly constrained and monitored.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.