- Executive Summary
- Introduction
- Problem Context
- Solution Details
- 1. Data Generation with Enhanced Realism
- 2. Advanced Feature Engineering and Demand Forecasting
- 3. Optimization Model
- 4. Iterative Optimization Process
- 5. Scalability and Performance Enhancements
- 6. Validation and Evaluation
- 7. Ethical Considerations
- 8. Integration into Retail Operations
- 9. Business Benefits and ROI
- Conclusion
- Source Code
Executive Summary
In the rapidly evolving retail landscape, effective assortment planning and optimization are crucial for retailers to meet customer demands, maximize profitability, and maintain a competitive edge. This article presents a comprehensive exploration of an advanced assortment allocation system designed to address the complexities of modern retail. By integrating sophisticated machine learning techniques, modeling cannibalization effects, handling new products without historical sales data, and implementing an iterative optimization process, the proposed solution offers a robust framework for optimizing product assortments.
Key Highlights:
- Advanced Demand Forecasting: Utilizing machine learning models with advanced feature engineering to capture seasonality, trends, cannibalization effects, and estimate demand for new products.
- Modeling Cannibalization Effects: Incorporating cross-elasticity coefficients and constraints into both demand forecasting and optimization models.
- Handling New Products: Leveraging similarity models, such as CLIP, to estimate demand for new products without historical sales data.
- Iterative Optimization Process: Implementing a feedback loop where demand forecasts and assortment decisions inform each other until convergence.
- Enhanced Optimization Model: Building a robust model using Pyomo, considering inventory, capacity, cannibalization constraints, and the inclusion of new products.
- Scalability and Performance: Employing high-performance libraries and parallel computing to handle large datasets efficiently.
- Validation and Evaluation: Implementing methods for model validation, performance measurement, and simulation to assess effectiveness.
- Ethical Considerations: Addressing data privacy, algorithmic bias, and responsible AI practices.
- Integration into Retail Operations: Providing best practices for integrating the system into existing workflows.
- Business Benefits and ROI: Highlighting the expected return on investment and impact on gross margin.
Code example at the end of article
Introduction
In today’s fast-paced retail environment, effective assortment planning and optimization are more critical than ever. Retailers face the challenge of offering the right mix of products to meet customer demands while maximizing profitability. This task becomes even more complex with the introduction of new products lacking historical sales data and the need to account for cannibalization effects among similar products.
This article presents a comprehensive exploration of building an advanced and sophisticated assortment allocation system. We dive into the challenges of demand forecasting at a granular level, modeling cannibalization effects, handling new products without prior sales data, optimizing product allocation, and ensuring scalability and performance. The solution integrates cutting-edge machine learning techniques, similarity models, iterative optimization processes, and robust data processing methodologies.
Problem Context
Challenges in Assortment Planning
- Demand Forecasting: Predicting product demand at the store level is essential for effective assortment planning. Forecasts must account for seasonality, trends, promotions, cannibalization effects, and the uncertainty associated with new products lacking historical data.
- Cannibalization Effects: Introducing similar products can lead to cannibalization, where one product’s sales reduce the sales of another. Properly modeling these effects is crucial to avoid overstocking and lost sales opportunities.
- New Product Introductions: Forecasting demand for new products without historical sales data poses a significant challenge. Traditional forecasting methods may not suffice, requiring innovative approaches like similarity modeling and leveraging product attributes.
- Inventory and Capacity Constraints: Stores have limited capacity, and products have finite inventory levels. Allocations must respect these constraints to prevent stockouts and overstocking.
- Complex Interdependencies: The assortment offered influences demand, and demand influences assortment decisions. Capturing this cyclical relationship adds complexity to the modeling process.
- Scalability and Performance: Handling large datasets and complex optimization models requires efficient algorithms and scalable data processing techniques.
Approach Overview
To address these challenges, we adopt a comprehensive approach that includes:
- Advanced Demand Forecasting: Using machine learning models (e.g., XGBoost) with advanced feature engineering to capture seasonality, trends, cannibalization effects, and estimate demand for new products.
- Modeling Cannibalization Effects: Incorporating cannibalization effects into both demand forecasting and optimization models, using cross-elasticity coefficients and constraints.
- Handling New Products: Leveraging similarity models, such as CLIP (Contrastive Language-Image Pre-training), to estimate demand for new products without historical sales data.
- Iterative Optimization Process: Implementing an iterative process where demand forecasts are adjusted based on the proposed assortment, and the assortment is re-optimized accordingly.
- Enhanced Optimization Model: Building a robust optimization model using Pyomo, considering inventory, capacity, cannibalization constraints, and the inclusion of new products.
- Scalability and Performance Enhancements: Leveraging high-performance libraries like Polars for data processing and utilizing parallel computing for efficiency.
- Validation and Evaluation: Implementing methods for model validation, performance measurement, and simulation to assess effectiveness.
- Ethical Considerations: Addressing data privacy, algorithmic bias, and responsible AI practices.
- Integration into Retail Operations: Providing best practices for integrating the system into existing workflows and ensuring user adoption.
Solution Details
1. Data Generation with Enhanced Realism
Synthetic Data Creation
To simulate a realistic retail environment, we generate synthetic data that includes:
- Stores: A set of retail stores with varying capacities.
- Products: A collection of products, including both existing and new products, each assigned to a product category.
- Product Attributes: Features such as images, descriptions, promotions, and trends.
- Sales Data: Historical weekly sales data over two years, incorporating seasonality and trend components for existing products.
- Inventory Levels: Current inventory levels for each product.
- Size Distribution: Sales data at the size level to capture granular demand patterns.
Incorporating Seasonality and Trends
We introduce seasonality and trends using sine functions and linear trends to mimic real-world sales patterns. This enhances the model’s ability to capture temporal variations in demand.
2. Advanced Feature Engineering and Demand Forecasting
Feature Engineering
Advanced feature engineering is crucial for improving forecasting accuracy. Key features include:
- Categorical Encoding: Transforming categorical variables (e.g., store, product, category) into numerical codes.
- Date Features: Extracting features like day of the week, day of the year, week of the year, month, and year.
- Seasonality: Creating sine and cosine transformations to model seasonal patterns.
- Lag Features: Including lagged sales values (e.g., sales from previous weeks) to capture autocorrelation.
- Rolling Statistics: Computing rolling means to capture trends over time.
- Assortment Features: Adding features that represent the assortment composition, such as the number of similar products available.
Machine Learning Model: XGBoost
We use XGBoost, a gradient boosting algorithm, for demand forecasting due to its ability to handle nonlinear relationships and interactions. The model is trained on the engineered features to predict future sales.
Time Series Cross-Validation
To validate the model’s performance, we employ time series cross-validation, which respects the temporal order of data and provides a more realistic assessment of forecasting accuracy.
Incorporating Cannibalization Effects
Modeling Cannibalization in Demand Forecasting
To capture cannibalization effects in demand forecasting:
- Cross-Elasticity Coefficients: Estimating how the demand for one product is affected by the availability of similar products.
- Assortment Variables: Including features that represent the presence of similar products in the assortment.
- Adjustment of Demand Forecasts: Modifying demand forecasts based on cross-elasticity coefficients to reflect cannibalization.
Incorporating Cannibalization in Optimization Model
In the optimization model, we:
- Define Cannibalization Constraints: Limit the number of similar products allocated to each store to prevent excessive cannibalization.
- Penalize Cannibalization in Objective Function: Optionally, include penalties for assortments that may lead to high cannibalization.
Handling New Products Without Historical Sales Data
Challenges with New Products
New products lack historical sales data, making it difficult to forecast demand using traditional time series methods. Accurately estimating demand is crucial to avoid overstocking or understocking and to make informed assortment decisions.
Leveraging Similarity Models
We use similarity models to estimate demand for new products:
- CLIP Model: Utilize the CLIP model, which creates embeddings for images and text, mapping them into a shared vector space.
- Product Embeddings: Extract embeddings for new and existing products using their images and descriptions.
- Similarity Calculation: Compute cosine similarity between the embeddings of new products and existing products.
- Estimating Demand: Use the historical sales data of similar existing products, weighted by similarity scores, to estimate demand for new products.
Integration into the System
- Feature Engineering Pipeline: Include embeddings and similarity scores in the feature set used for demand forecasting.
- Demand Forecasting Adjustments: Modify the forecasting model to handle new products without historical sales data.
- Optimization Model Updates: Ensure that the optimization model includes new products and adjusts constraints accordingly.
3. Optimization Model
Optimization Framework
We build the optimization model using Pyomo, an open-source optimization modeling language in Python. The model aims to maximize total profit while considering various constraints, including the inclusion of new products with estimated demand.
Decision Variables
- Allocation Variables: The number of units of each product allocated to each store, including new products.
Objective Function
- Maximize Total Profit: Calculated as the sum of the gross margin per unit multiplied by the allocated units, adjusted for cannibalization effects.
Constraints
- Demand Constraints: Allocations cannot exceed the adjusted demand forecasts, including estimated demands for new products.
- Inventory Constraints: Total allocations of a product cannot exceed its inventory level.
- Capacity Constraints: Total allocations to a store cannot exceed its capacity.
- Cannibalization Constraints: Limits are set on the number of similar products (from the same category) allocated to a store.
- Uncertainty Consideration for New Products: Optionally, include conservative allocation limits for new products due to higher demand uncertainty.
4. Iterative Optimization Process
Recognizing the interdependence between demand and assortment, we implement an iterative process:
- Initial Demand Forecast: Generate initial forecasts, including estimated demands for new products.
- Optimize Assortment: Use the initial forecasts to optimize the assortment.
- Adjust Demand Forecasts: Update forecasts based on the proposed assortment, incorporating cannibalization effects.
- Re-optimize Assortment: Use the adjusted forecasts to re-optimize the assortment.
- Convergence Check: Repeat steps 3 and 4 until the allocation plan stabilizes.
This iterative approach ensures that both demand forecasts and assortment decisions are aligned and reflect the impact of cannibalization and the uncertainty of new products.
5. Scalability and Performance Enhancements
Data Processing with Polars
We use Polars, a high-performance DataFrame library, for efficient data processing. Polars leverages Apache Arrow memory formats and is optimized for speed, making it suitable for large datasets.
Parallel Computing
Parallel processing is utilized to speed up computations, especially when adjusting demand forecasts, processing embeddings, and during the iterative optimization process. Libraries like joblib are used for easy parallelization.
Efficient Solvers
For solving the optimization model, we use efficient solvers like GLPK or Gurobi (if available). These solvers can handle large-scale optimization problems effectively.
Performance Optimization for Embeddings
Processing embeddings for a large number of products can be computationally intensive:
- Batch Processing: Process embeddings in batches to utilize hardware acceleration efficiently.
- Caching Embeddings: Cache embeddings of existing products to prevent redundant computations.
6. Validation and Evaluation
Model Validation
- Cross-Validation: Employed during demand forecasting to assess model performance.
- Backtesting for New Products: Simulate the introduction of past new products to validate the demand estimation method.
- Holdout Testing: Using a separate dataset to test the model’s predictive accuracy.
Performance Metrics
- Mean Absolute Error (MAE): Used to measure the accuracy of demand forecasts.
- Total Allocated Units and Expected Profit: Calculated after optimization to evaluate the effectiveness of the allocation plan.
Simulation and What-If Analysis
Simulation is conducted to assess the impact of demand uncertainties on profit:
- Demand Simulation: Adjusting demand forecasts by introducing random variations, especially for new products with higher uncertainty.
- Re-optimization: Running the optimization model with simulated demands.
- Analysis: Calculating the average profit and standard deviation to understand the potential range of outcomes.
Monitoring and Feedback Loop
- Post-Launch Monitoring: After launching new products, compare actual sales with forecasts.
- Model Adjustments: Update models based on actual performance to improve future forecasts.
7. Ethical Considerations
Maintaining ethical standards is crucial for building trust and ensuring the responsible use of AI in assortment planning.
- Data Privacy and Security: Objective: Protect sensitive customer and business data. Approach: Implement data anonymization, encryption, and compliance with regulations like GDPR and CCPA. Benefits: Safeguarded data integrity and compliance with legal standards.
- Algorithmic Fairness: Objective: Prevent biases in assortment decisions. Approach: Employ fairness metrics, diverse training data, and transparency in model operations. Benefits: Equitable assortment decisions and enhanced stakeholder trust.
- Transparency and Explainability: Objective: Make AI-driven decisions understandable to stakeholders. Approach: Utilize interpretable models and provide clear explanations for assortment recommendations. Benefits: Increased confidence in AI systems and better stakeholder alignment.
8. Integration into Retail Operations
Best Practices for Integration
- Stakeholder Engagement: Involve key users and stakeholders early in the development and implementation process.
- API Development: Create APIs for seamless integration with existing systems (e.g., ERP, POS).
- Modular Deployment: Implement the system in phases to minimize disruption.
- Data Integration: Ensure compatibility with existing data formats and databases.
- Training Programs: Provide comprehensive training and support for users.
- Change Management: Use strategies to facilitate adoption and address resistance to change.
User Experience Considerations
- Intuitive Interfaces: Design user-friendly dashboards and tools for interacting with the system.
- Visualization Tools: Provide visual aids to help users understand data and insights.
- Feedback Mechanisms: Implement features that allow users to provide feedback and report issues.
9. Business Benefits and ROI
Further analysis of the system’s impact will demonstrate its value and support strategic investment decisions.
- Comprehensive ROI Metrics: Objective: Quantify the financial benefits of the assortment planning system. Approach: Track metrics such as sales growth, inventory turnover, gross margin return on investment (GMROI), and stockout rates. Benefits: Clear evidence of the system’s effectiveness and justification for continued investment.
- Case Studies and Real-World Applications: Objective: Showcase practical implementations and success stories. Approach: Develop case studies demonstrating the system’s application in various retail scenarios. Benefits: Illustrative examples that highlight the system’s capabilities and benefits.
Expected Benefits for Fashion Retailers
- Increased Sales: Better assortment planning leads to higher customer satisfaction and increased sales.
- Improved Gross Margins: Optimized inventory reduces markdowns and stockouts, improving profit margins.
- Inventory Efficiency: More accurate demand forecasting reduces excess inventory and associated holding costs.
- Competitive Advantage: Advanced analytics provide insights that can differentiate retailers in the market.
- Customer Loyalty: Enhanced product availability and selection improve customer experience and loyalty.
Return on Investment (ROI) and Gross Margin Impact
- ROI Estimates: Industry studies suggest that advanced assortment planning can improve sales by 2-7% and reduce inventory costs by 5-10%.
- Gross Margin Improvement: Margins can improve by up to 5% due to reduced markdowns and optimized pricing strategies.
- Payback Period: With significant cost savings and revenue increases, the investment in such a system can often be recouped within one to two years.
Performance Measurement
- Key Performance Indicators (KPIs): Sales growth Inventory turnover rates Gross margin return on investment (GMROI) Stockout rates Customer satisfaction scores
- Regular Reporting: Generate periodic reports to track KPIs over time and make data-driven decisions.
Conclusion
The advanced assortment planning and optimization system presented in this article addresses the complex challenges faced by retailers in today’s dynamic environment. By integrating advanced machine learning techniques, modeling cannibalization effects, handling new products without historical sales data, and implementing an iterative optimization process, the solution provides a robust framework for maximizing profitability while respecting operational constraints.
Key Takeaways:
- Holistic Approach: The system offers a comprehensive solution that integrates multiple facets of assortment planning.
- Scalability and Performance: Designed to handle large datasets and complex models efficiently.
- Adaptability: Capable of adjusting to changes in market trends, consumer behavior, and product offerings.
- Ethical and Responsible: Incorporates ethical considerations to ensure responsible AI practices.
By adopting this comprehensive approach, retailers can make informed assortment decisions that align with customer preferences, minimize cannibalization, effectively introduce new products, and drive business growth.
Source Code
# Import necessary libraries
import numpy as np
import pandas as pd
import polars as pl
import time
import psutil
import torch
import clip # OpenAI CLIP model
from PIL import Image
import xgboost as xgb
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_absolute_error
from joblib import Parallel, delayed
from pyomo.environ import *
from pyomo.opt import SolverFactory
import os
# Set random seed for reproducibility
np.random.seed(42)
# Check if CUDA is available for CLIP model
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load CLIP model and preprocessing
model_clip, preprocess = clip.load("ViT-B/32", device=device)
# Global Variables
gross_margin_per_unit = 50
# 1. Data Generation with Enhanced Realism
def generate_synthetic_data():
# Stores and Products
stores = ['Store_' + str(i) for i in range(1, 6)]
products = ['Product_' + str(i) for i in range(1, 11)]
product_categories = ['Category_' + str(i) for i in range(1, 4)]
sizes = ['Size_' + str(i) for i in range(6, 13)]
dates = pd.date_range(start='2021-01-01', periods=104, freq='W') # 2 years weekly data
# Create product DataFrame with categories
product_df = pd.DataFrame({
'Product': products,
'Product_Category': np.random.choice(product_categories, size=len(products)),
'Image_Path': ['images/' + p + '.jpg' for p in products], # Assuming image paths
'Description': ['Description of ' + p for p in products]
})
# Generate placeholder images (if not available)
os.makedirs('images', exist_ok=True)
for path in product_df['Image_Path']:
if not os.path.exists(path):
Image.new('RGB', (100, 100)).save(path)
# Seasonality and Trend
seasonal_pattern = np.sin(2 * np.pi * dates.dayofyear / 365.25)
trend = np.linspace(1, 1.1, len(dates))
# Historical Sales Data
data = []
for store in stores:
for product in products:
sales = np.random.poisson(lam=20, size=len(dates))
sales = sales * trend * (1 + seasonal_pattern)
df = pd.DataFrame({
'Store': store,
'Product': product,
'Date': dates,
'Sales': sales
})
data.append(df)
historical_sales = pd.concat(data, ignore_index=True)
historical_sales = historical_sales.merge(product_df[['Product', 'Product_Category']], on='Product', how='left')
# Product Trends and Promotions
product_trends = pd.DataFrame({
'Product': products,
'Trend': np.random.uniform(0.9, 1.1, size=len(products))
})
promotions = pd.DataFrame({
'Store': np.random.choice(stores, 200),
'Product': np.random.choice(products, 200),
'Date': np.random.choice(dates, 200),
'Promotion': 1
})
# Merge Data
historical_sales = historical_sales.merge(product_trends, on='Product', how='left')
historical_sales = historical_sales.merge(promotions, on=['Store', 'Product', 'Date'], how='left')
historical_sales['Promotion'] = historical_sales['Promotion'].fillna(0)
# Store Capacities and Inventory Levels
store_capacities = pd.DataFrame({
'Store': stores,
'Capacity': np.random.randint(5000, 10000, size=len(stores))
})
inventory_levels = pd.DataFrame({
'Product': products,
'Inventory': np.random.randint(2000, 5000, size=len(products))
})
# Historical Size Sales
size_sales_data = []
for index, row in historical_sales.iterrows():
total_sales = row['Sales']
size_percentages = np.random.dirichlet(np.ones(len(sizes)))
for size, percentage in zip(sizes, size_percentages):
size_sales_data.append({
'Store': row['Store'],
'Product': row['Product'],
'Date': row['Date'],
'Size': size,
'Sales': total_sales * percentage
})
historical_size_sales = pd.DataFrame(size_sales_data)
return historical_sales, historical_size_sales, store_capacities, inventory_levels, product_df
# Generate Data
historical_sales_df, historical_size_sales_df, store_capacities_df, inventory_levels_df, product_df = generate_synthetic_data()
# Convert DataFrames to Polars DataFrames for performance
historical_sales = pl.from_pandas(historical_sales_df)
historical_size_sales = pl.from_pandas(historical_size_sales_df)
store_capacities = pl.from_pandas(store_capacities_df)
inventory_levels = pl.from_pandas(inventory_levels_df)
# 2. Advanced Feature Engineering and Demand Forecasting
def feature_engineering(df):
# Encode categorical variables
le_store = LabelEncoder()
le_product = LabelEncoder()
le_category = LabelEncoder()
df = df.with_columns([
pl.Series('Store_Code', le_store.fit_transform(df['Store'].to_pandas())),
pl.Series('Product_Code', le_product.fit_transform(df['Product'].to_pandas())),
pl.Series('Category_Code', le_category.fit_transform(df['Product_Category'].to_pandas()))
])
# Date features
df = df.with_columns([
pl.col('Date').dt.weekday().alias('DayOfWeek'),
pl.col('Date').dt.day_of_year().alias('DayOfYear'),
pl.col('Date').dt.week().alias('WeekOfYear'),
pl.col('Date').dt.month().alias('Month'),
pl.col('Date').dt.year().alias('Year'),
pl.sin(2 * np.pi * pl.col('DayOfYear') / 365.25).alias('Seasonality'),
])
# Lag features
df = df.sort(['Store', 'Product', 'Date'])
df = df.with_columns([
pl.col('Sales').shift(1).over('Store', 'Product').alias('Lag_1'),
pl.col('Sales').shift(7).over('Store', 'Product').alias('Lag_7'),
pl.col('Sales').rolling_mean(window_size=4).over('Store', 'Product').alias('RollingMean_4'),
])
# Handle missing values
df = df.fill_null(0)
# Assortment features
df = df.with_columns([
pl.col('Product_Category').alias('Category')
])
df = df.join(df.groupby(['Store', 'Date']).agg(pl.count('Product').alias('Total_Products')), on=['Store', 'Date'])
df = df.join(df.groupby(['Store', 'Date', 'Category']).agg(pl.count('Product').alias('Category_Product_Count')), on=['Store', 'Date', 'Category'])
df = df.with_columns([
(pl.col('Category_Product_Count') - 1).alias('Similar_Product_Count')
])
return df
# Function to estimate cross-elasticity coefficients (for simplicity, we use random values here)
def get_cross_elasticity_coefficients(products):
cross_elasticities = []
for p1 in products:
for p2 in products:
if p1 != p2:
coefficient = np.random.uniform(-0.5, 0)
cross_elasticities.append({'Product1': p1, 'Product2': p2, 'Coefficient': coefficient})
cross_elasticities_df = pd.DataFrame(cross_elasticities)
return cross_elasticities_df
# Adjust demand forecasts based on cannibalization effects
def adjust_demand_with_cannibalization(demand_forecast, cross_elasticities):
demand_forecast = demand_forecast.merge(cross_elasticities, left_on='Product', right_on='Product1', how='left')
demand_forecast = demand_forecast.merge(demand_forecast[['Product', 'Forecast_Sales']], left_on='Product2', right_on='Product', suffixes=('', '_other'), how='left')
demand_forecast['Adjusted_Forecast_Sales'] = demand_forecast['Forecast_Sales'] + demand_forecast['Coefficient'] * demand_forecast['Forecast_Sales_other']
demand_forecast['Adjusted_Forecast_Sales'] = demand_forecast['Adjusted_Forecast_Sales'].fillna(demand_forecast['Forecast_Sales'])
demand_forecast = demand_forecast[['Store', 'Product', 'Adjusted_Forecast_Sales']].drop_duplicates()
return demand_forecast
# Extract embeddings using CLIP
def extract_embeddings(product_df):
embeddings = []
for index, row in product_df.iterrows():
# Load image
image = preprocess(Image.open(row['Image_Path'])).unsqueeze(0).to(device)
# Get text description
text = clip.tokenize([row['Description']]).to(device)
# Compute embeddings
with torch.no_grad():
image_embedding = model_clip.encode_image(image)
text_embedding = model_clip.encode_text(text)
# Average embeddings
embedding = (image_embedding + text_embedding) / 2
embeddings.append(embedding.cpu().numpy()[0])
product_df['Embedding'] = embeddings
return product_df
# Compute similarity between embeddings
def compute_similarity(new_embedding, existing_embeddings):
similarities = []
for embedding in existing_embeddings:
similarity = np.dot(new_embedding, embedding) / (np.linalg.norm(new_embedding) * np.linalg.norm(embedding))
similarities.append(similarity)
return np.array(similarities)
# Estimate demand for new products
def estimate_new_product_demand(new_product, existing_products_df):
# Extract embedding for new product
new_embedding = new_product['Embedding']
# Get embeddings for existing products
existing_embeddings = np.vstack(existing_products_df['Embedding'].values)
# Compute similarity scores
similarities = compute_similarity(new_embedding, existing_embeddings)
existing_products_df['Similarity'] = similarities
# Get historical sales
sales = existing_products_df['Historical_Sales']
# Calculate weighted average of sales
estimated_demand = np.average(sales, weights=similarities)
return estimated_demand
def forecast_demand(historical_sales, new_products_df=None):
historical_sales = feature_engineering(historical_sales)
data = historical_sales.to_pandas()
# Prepare data for XGBoost
X = data.drop(['Sales', 'Date', 'Store', 'Product', 'Product_Category', 'Category'], axis=1)
y = data['Sales']
# Time Series Cross-Validation
tscv = TimeSeriesSplit(n_splits=5)
xgb_model = xgb.XGBRegressor(objective='reg:squarederror')
mae_scores = cross_val_score(xgb_model, X, y, cv=tscv, scoring='neg_mean_absolute_error')
print(f'Cross-Validated MAE: {-np.mean(mae_scores)}')
# Fit model on entire data
xgb_model.fit(X, y)
# Forecasting for next 26 weeks
future_dates = pd.date_range(start=historical_sales['Date'].to_pandas().max() + pd.Timedelta(weeks=1), periods=26, freq='W')
future_data = []
for store in historical_sales['Store'].unique():
for product in historical_sales['Product'].unique():
for date in future_dates:
future_data.append({'Store': store, 'Product': product, 'Date': date})
future_df = pd.DataFrame(future_data)
future_df = pl.from_pandas(future_df)
future_df = future_df.join(product_df[['Product', 'Product_Category']], on='Product', how='left')
future_df = feature_engineering(future_df)
X_future = future_df.to_pandas().drop(['Date', 'Store', 'Product', 'Product_Category', 'Category'], axis=1)
future_df = future_df.to_pandas()
future_df['Forecast_Sales'] = xgb_model.predict(X_future)
existing_demand_forecast = future_df[['Store', 'Product', 'Forecast_Sales']]
# Handle new products
if new_products_df is not None:
# Prepare embeddings
existing_products_df = product_df.copy()
existing_products_df['Historical_Sales'] = historical_sales_df.groupby('Product')['Sales'].mean().reset_index()['Sales']
existing_products_df = extract_embeddings(existing_products_df)
new_products_df = extract_embeddings(new_products_df)
new_demand_estimates = []
for index, new_product in new_products_df.iterrows():
estimated_demand = estimate_new_product_demand(new_product, existing_products_df)
for store in stores:
new_demand_estimates.append({
'Store': store,
'Product': new_product['Product'],
'Forecast_Sales': estimated_demand
})
new_demand_forecast = pd.DataFrame(new_demand_estimates)
# Combine with existing demand forecasts
demand_forecast = pd.concat([existing_demand_forecast, new_demand_forecast], ignore_index=True)
else:
demand_forecast = existing_demand_forecast
return demand_forecast
# Generate cross-elasticity coefficients
cross_elasticities_df = get_cross_elasticity_coefficients(historical_sales['Product'].unique())
# Assume we have new products to introduce
new_products_df = pd.DataFrame({
'Product': ['New_Product_1', 'New_Product_2'],
'Product_Category': ['Category_1', 'Category_2'],
'Image_Path': ['images/New_Product_1.jpg', 'images/New_Product_2.jpg'],
'Description': ['Description of New_Product_1', 'Description of New_Product_2']
})
# Generate placeholder images for new products
for path in new_products_df['Image_Path']:
if not os.path.exists(path):
Image.new('RGB', (100, 100)).save(path)
# Forecast Demand
demand_forecast = forecast_demand(historical_sales, new_products_df)
# Adjust Demand with Cannibalization Effects
demand_forecast = adjust_demand_with_cannibalization(demand_forecast, cross_elasticities_df)
# 3. Enhanced Optimization Model Incorporating Cannibalization
def optimize_allocation(demand_forecast, store_capacities, inventory_levels):
# Initialize Pyomo Model
model = ConcreteModel()
# Sets
products = demand_forecast['Product'].unique()
stores = demand_forecast['Store'].unique()
model.Products = Set(initialize=products)
model.Stores = Set(initialize=stores)
# Parameters
demand = demand_forecast.set_index(['Product', 'Store'])['Adjusted_Forecast_Sales'].to_dict()
inventory_levels_dict = inventory_levels.to_pandas().set_index('Product')['Inventory'].to_dict()
# Add new products to inventory with assumed inventory levels
for product in new_products_df['Product']:
inventory_levels_dict[product] = 1000 # Assumed inventory for new products
capacity = store_capacities.to_pandas().set_index('Store')['Capacity'].to_dict()
max_similar_products_per_store = 2 # Constraint for cannibalization
similar_products = product_df.groupby('Product_Category')['Product'].apply(list).to_dict()
# Include new products in similar_products
for index, row in new_products_df.iterrows():
category = row['Product_Category']
product = row['Product']
if category in similar_products:
similar_products[category].append(product)
else:
similar_products[category] = [product]
# Variables
model.Allocation = Var(model.Products, model.Stores, domain=NonNegativeIntegers)
# Objective Function
def objective_rule(m):
total_profit = sum(
gross_margin_per_unit * m.Allocation[p, s]
for p in m.Products for s in m.Stores
)
return total_profit
model.Objective = Objective(rule=objective_rule, sense=maximize)
# Demand Constraints
def demand_constraint_rule(m, p, s):
return m.Allocation[p, s] <= demand.get((p, s), 0)
model.DemandConstraint = Constraint(model.Products, model.Stores, rule=demand_constraint_rule)
# Inventory Constraints
def inventory_constraint_rule(m, p):
return sum(m.Allocation[p, s] for s in m.Stores) <= inventory_levels_dict.get(p, 1000) # Default inventory for new products
model.InventoryConstraint = Constraint(model.Products, rule=inventory_constraint_rule)
# Capacity Constraints
def capacity_constraint_rule(m, s):
return sum(m.Allocation[p, s] for p in m.Products) <= capacity[s]
model.CapacityConstraint = Constraint(model.Stores, rule=capacity_constraint_rule)
# Cannibalization Constraints
def cannibalization_constraint_rule(m, s, category):
products_in_category = similar_products[category]
return sum(m.Allocation[p, s] >= 1 for p in products_in_category if p in m.Products) <= max_similar_products_per_store
categories = similar_products.keys()
model.CannibalizationConstraint = Constraint(model.Stores, categories, rule=cannibalization_constraint_rule)
# Solve
opt = SolverFactory('glpk') # Use GLPK or 'gurobi' if available
opt.solve(model)
# Extract Results
allocation_results = []
for p in model.Products:
for s in model.Stores:
allocated_units = model.Allocation[p, s].value
if allocated_units > 0:
allocation_results.append({'Product': p, 'Store': s, 'Allocated_Units': allocated_units})
allocation_results = pd.DataFrame(allocation_results)
return allocation_results
# Optimize Allocation
allocation_results = optimize_allocation(demand_forecast, store_capacities, inventory_levels)
# 4. Iterative Process
def iterative_optimization(max_iterations=5):
convergence = False
iteration = 0
prev_allocation = None
while not convergence and iteration < max_iterations:
print(f"Iteration {iteration + 1}")
# Forecast demand
demand_forecast = forecast_demand(historical_sales, new_products_df)
# Adjust demand with cannibalization effects
demand_forecast = adjust_demand_with_cannibalization(demand_forecast, cross_elasticities_df)
# Optimize allocation
allocation_results = optimize_allocation(demand_forecast, store_capacities, inventory_levels)
# Check for convergence
if prev_allocation is not None and allocation_results.equals(prev_allocation):
convergence = True
prev_allocation = allocation_results.copy()
iteration += 1
return allocation_results
# Run Iterative Optimization
allocation_results = iterative_optimization()
# 5. Performance Measurement
def measure_performance():
start_time = time.time()
# Code execution (e.g., forecasting and optimization)
# Already executed above
end_time = time.time()
print(f"Execution Time: {end_time - start_time} seconds")
print(f"CPU Usage: {psutil.cpu_percent()}%")
print(f"Memory Usage: {psutil.virtual_memory().percent}%")
# Measure Performance
measure_performance()
# 6. Visualization and User Interface (Optional)
def visualize_allocation(allocation_results):
# This function can be implemented using Streamlit or other visualization libraries
pass
# 7. Validation and Evaluation
def evaluate_model(allocation_results):
total_sales = allocation_results['Allocated_Units'].sum()
total_profit = allocation_results['Allocated_Units'].sum() * gross_margin_per_unit
print(f"Total Allocated Units: {total_sales}")
print(f"Total Expected Profit: ${total_profit:.2f}")
# Evaluate Model
evaluate_model(allocation_results)
# 8. Simulation and What-If Analysis
def simulate_optimization(demand_forecast, num_simulations=10):
simulation_results = []
for i in range(num_simulations):
print(f"Simulation {i + 1}")
# Simulate demand uncertainties
simulated_demand = demand_forecast.copy()
simulated_demand['Adjusted_Forecast_Sales'] = simulated_demand['Adjusted_Forecast_Sales'] * np.random.normal(1, 0.1, len(simulated_demand))
# Re-optimize
allocation = optimize_allocation(simulated_demand, store_capacities, inventory_levels)
total_profit = allocation['Allocated_Units'].sum() * gross_margin_per_unit
simulation_results.append(total_profit)
# Analyze results
print(f'Average Profit from Simulation: {np.mean(simulation_results)}')
print(f'Profit Standard Deviation from Simulation: {np.std(simulation_results)}')
# Run Simulation
simulate_optimization(demand_forecast)