Company
Microsoft
Title
Evaluating Product Image Integrity in AI-Generated Advertising Content
Industry
Media & Entertainment
Year
2024
Summary (short)
Microsoft worked with an advertising customer to enable 1:1 ad personalization while ensuring product image integrity in AI-generated content. They developed a comprehensive evaluation system combining template matching, Mean Squared Error (MSE), Peak Signal to Noise Ratio (PSNR), and Cosine Similarity to verify that AI-generated backgrounds didn't alter the original product images. The solution successfully enabled automatic verification of product image fidelity in AI-generated advertising materials.
This case study from Microsoft explores a critical challenge in the deployment of generative AI for advertising: ensuring the integrity of product images when generating personalized advertisements at scale. The project focused on developing robust evaluation methods to verify that AI-generated content maintains the accuracy of product representations - a crucial requirement for commercial applications. The core business problem addressed was enabling 1:1 ad personalization while maintaining strict control over how products are represented. This is particularly important in advertising where any unauthorized modification of product images could lead to legal issues or loss of brand trust. The technical challenge was to verify that AI inpainting techniques, which generate custom backgrounds around products, don't inadvertently modify the original product images. The solution architecture developed combines multiple complementary evaluation techniques: * Template Matching: Using OpenCV to locate the original product within the generated image, providing a foundation for pixel-by-pixel comparison. This addresses the challenge of product translation within the image. * Mean Squared Error (MSE): Implements pixel-level comparison to detect color changes and disproportionate scaling. MSE is particularly effective when the product location is known through template matching. * Peak Signal to Noise Ratio (PSNR): Provides a logarithmic measure of image differences, making it easier to understand orders of magnitude in differences between images. This complements MSE for detecting color and scaling changes. * Cosine Similarity: Utilizes VGG16 neural network features to compare edge and curve characteristics, allowing detection of structural changes even when the product is translated or proportionately scaled within the image. The implementation details reveal careful consideration of production requirements: For template matching, they identified important constraints around image resolution: * The template must be the same size or smaller than the target image * Matching works best when resolutions are identical between template and generated image * The system needs to account for the specific output resolutions of different GenAI models The feature extraction pipeline for cosine similarity demonstrates production-ready considerations: * Uses pre-trained VGG16 model with ImageNet weights * Implements proper image preprocessing and dimension handling * Extracts features from a specific layer ('fc2') chosen for optimal comparison The system includes benchmark testing with controlled image modifications: * Various levels of image fill * Transparency handling * Content rotation * Translation effects * Color modifications * Image scaling variations Key lessons and production considerations emerged: * No single evaluation metric was sufficient - the combination of different techniques provides more robust verification * Template matching proves essential for establishing a baseline for comparison * Resolution management between original and generated images requires careful handling * The system needs to account for "acceptable" variations vs problematic modifications * Performance implications of running multiple evaluation techniques need consideration * Edge cases like rotated images or content additions outside the template boundary require special handling The case study also highlights important limitations and areas for future development: * Current implementation doesn't handle rotated images well without additional processing * Detecting additions outside the template boundary remains challenging * Resolution differences between template and generated images require additional handling * Performance optimization may be needed for high-volume processing From an LLMOps perspective, this case study demonstrates several important principles: * The importance of robust evaluation systems for AI-generated content * The need for multiple, complementary evaluation techniques * The value of establishing clear baseline measurements * The importance of understanding and documenting system limitations * The need for careful consideration of edge cases and failure modes The solution shows how traditional computer vision techniques can be combined with deep learning approaches to create robust evaluation systems for AI-generated content. This hybrid approach, combining classical image processing with neural network-based features, provides a more comprehensive evaluation than either approach alone would achieve. This work has broader implications for LLMOps, demonstrating how to build robust evaluation systems for AI-generated content where maintaining specific aspects of the original input is crucial. The approach could be adapted for other domains where AI generates content that must preserve certain characteristics of the input while modifying others.

Start your new ML Project today with ZenML Pro

Join 1,000s of members already deploying models with ZenML.