Hapag-Lloyd faced challenges with time-consuming manual corporate audit processes. They implemented a GenAI solution using Databricks Mosaic AI to automate audit finding generation and executive summary creation. By fine-tuning the DBRX model and implementing a RAG-based chatbot, they achieved a 66% decrease in time spent creating new findings and a 77% reduction in executive summary review time, significantly improving their audit efficiency.
Hapag-Lloyd, a global maritime shipping company, successfully implemented a generative AI solution to modernize and streamline their corporate audit processes. This case study demonstrates a practical application of LLMs in a production environment, highlighting both the technical implementation details and the real-world business impact achieved through careful deployment of AI technologies.
The company faced significant challenges in their audit processes, particularly around the time-consuming nature of documentation and report writing. The manual nature of these processes not only created inefficiencies but also introduced potential inconsistencies in documentation. The primary goals were to reduce time spent on documentation while maintaining high quality standards and to enable auditors to focus more on critical analysis rather than routine documentation tasks.
From a technical implementation perspective, the solution involved several key components and decisions:
**Model Selection and Fine-tuning:**
* Initially evaluated multiple models including Llama 2 70B and Mixtral
* Ultimately selected Databricks' DBRX model based on superior performance
* Fine-tuned the DBRX model on 12T tokens of carefully curated data specific to their audit use case
**Architecture and Implementation:**
* Developed two main components:
* Finding Generation Interface: Automated the creation of audit findings from bullet points
* Chatbot Interface: Built using Gradio and integrated with Mosaic AI Model Serving
* Implemented Retrieval Augmented Generation (RAG) to provide accurate and contextually relevant responses
* Used MLflow for automating prompt and model evaluation
* Integrated with Delta tables for efficient storage and retrieval of findings
* Deployed through Databricks Model Serving for production use
The implementation process revealed several important LLMOps considerations:
**Infrastructure and Deployment:**
* Previous attempts using AWS SysOps faced challenges with rapid setup and deployment
* Databricks platform provided a more streamlined approach to instance setup and management
* The solution architecture needed to support seamless integration with existing data pipelines
* Required careful attention to model evaluation and quality assurance processes
**Quality Assurance and Evaluation:**
* Implemented automated evaluation of prompts and models through MLflow
* Established metrics for measuring performance improvements
* Created a framework for continuous improvement and iteration
* Plans to implement Mosaic AI Agent Evaluation framework for more systematic assessment
**Production Considerations:**
* Needed to ensure consistent quality across all generated reports
* Required robust error handling and monitoring
* Implemented systems for managing model versions and deployments
* Established processes for maintaining and updating the fine-tuned models
The results of the implementation were significant and measurable:
* Reduced time for creating new findings from 15 minutes to 5 minutes (66% decrease)
* Decreased executive summary review time from 30 minutes to 7 minutes (77% reduction)
* Improved consistency in documentation
* Enhanced ability for auditors to focus on strategic analysis
Lessons learned and best practices emerged from this implementation:
**Model Selection:**
* The importance of thorough model evaluation before selection
* Need to balance performance with cost and resource requirements
* Value of using pre-trained models as a starting point
**Data and Fine-tuning:**
* Critical importance of high-quality training data
* Need for careful curation of fine-tuning datasets
* Importance of domain-specific training data
**Implementation Strategy:**
* Value of starting with specific, well-defined use cases
* Importance of measuring and documenting improvements
* Need for robust evaluation frameworks
* Benefits of iterative development and deployment
Future plans include:
* Extending the solution to cover more aspects of the audit process
* Improving the automated evaluation process
* Further fine-tuning models for better structure and organization of audit reports
* Expanding the use of generative AI to other administrative tasks
This case study demonstrates the practical application of LLMs in a business context, showing how careful attention to implementation details and proper LLMOps practices can lead to significant improvements in efficiency while maintaining quality standards. The success of this implementation has opened the door for further AI adoption within the organization, with plans to expand the use of these technologies to other areas of operations.
Start your new ML Project today with ZenML Pro
Join 1,000s of members already deploying models with ZenML.