Pernod Ricard
BotiqueAI
๐Ÿฅƒ Pernod Ricardโ˜๏ธ AzureMLOps ยท Data Science
Case Study

DStar

Sales Visit Optimization Engine โ€” 30 AI-ranked recommendations, every Monday morning.

A fully automated MLOps pipeline that analyses sales transactions, detects under-indexed SKUs across all points of sale, and delivers a prioritised visit list to every sales representative before the week begins.

30
visits recommended
per rep, per week
1ร—
automated trigger
every Monday 6:00 AM
<2 min
pipeline runtime
trigger to dashboard
100%
dockerised
on Azure Container Apps
Client
Pernod Ricard
Global spirits leader
Project
DStar
Sales intelligence tool
Cloud
Azure
Full managed stack
Language
Python 3.11
Pandas ยท Scikit-learn
End users
Sales force
Weekly active
Cadence
Weekly
Automated pipeline
The Challenge

Sales reps had data.
Not the right signal.

Pernod Ricard's sales force manages hundreds of points of sale. Each representative had access to raw sales figures โ€” but no structured way to prioritise accounts or identify which SKUs to push.

High-value opportunities were missed, visits were driven by habit, and under-performing SKUs went undetected until quarterly reviews โ€” too late to act.

No visit prioritisation
Reps relied on intuition. No data-driven ranking of which outlets needed attention most.
Hidden SKU distribution gaps
Under-indexed products were invisible in the data. No automated detection across the portfolio.
Quarterly blind spots
By the time reports surfaced insights, the selling window had already passed.
Siloed data sources
CRM, ERP and transaction data lived in separate systems โ€” no unified field view.
The Solution

A fully automated
weekly intelligence pipeline.

DStar runs autonomously every Monday morning. It ingests CRM, ERP and POS transaction data, computes indexation scores for every SKU at every outlet, and delivers a ranked list of 30 priority visits per rep before the working week begins.

01
Automatic weekly trigger
Azure Timer fires every Monday 6:00 AM โ€” zero human intervention required.
02
Parallel data ingestion
Durable Orchestrator pulls CRM, transactions and SKU inventory in parallel fan-out.
03
Indexation score computation
Python engine benchmarks each SKU at each POS against regional and national averages.
04
Composite ranking model
Combines indexation delta, visit recency, revenue potential and geographic clustering.
05
Dashboard delivery
Results pushed to Streamlit โ€” reps log in and see their personalised visit list.
Weekly
Cadence
vs quarterly reviews
30
Prioritised visits
per rep, per week
100%
Data-driven
no intuition bias
<2 min
Runtime
trigger to delivery
Architecture

Pipeline
diagram

All services run in Docker containers on Azure. The Durable Orchestrator manages parallelism, retries and state โ€” reliable execution every week without supervision.

Trigger & Orchestration
Data Sources & Output
ML Processing Layer
Ranking Engine

Drag nodes ยท Animated edges show data flow direction

The Algorithm

Under-indexed vs over-indexed SKUs

A product's sales at an outlet only make sense relative to its potential. DStar computes an indexation score, then layers in clustering and temporal signals to build a composite visit priority.

Observed
SKU_sales(POS)
Actual volume at outlet
รท
Expected
benchmark(region)
Regional / national avg
=
Score
index_score
Priority signal
Score < 1.0

Under-indexed

The product sells below its regional benchmark at this outlet. There is untapped potential โ€” a sales rep visit is likely to unlock incremental volume.

ยทDistribution gap detected
ยทCompetitor shelf share likely higher
ยทHigh visit ROI expected
Score > 1.0

Over-indexed

The product already outperforms its benchmark. This outlet is saturated โ€” the rep's time is more valuable at an under-indexed account.

ยทSKU already at full penetration
ยทVisit deprioritised this cycle
ยทFiltered out of Top 30 list
Clustering techniques used
๐Ÿ“
Geographic clustering (K-Means)

Points of sale are clustered by GPS coordinates using K-Means. This ensures the Top 30 recommendations form geographically efficient routes โ€” minimising travel time between visits.

โ†’Scikit-learn KMeans
โ†’Haversine distance metric
๐Ÿงฉ
SKU similarity clustering

SKUs are grouped by category, price tier and sales seasonality profile. Indexation benchmarks are computed within clusters, avoiding unfair comparisons between products with fundamentally different demand patterns.

โ†’Hierarchical clustering
โ†’Cosine similarity on SKU vectors
๐Ÿช
POS segmentation

Outlets are segmented by channel (on-trade vs off-trade), volume tier and historical purchase behaviour. Each segment uses its own benchmark distribution, improving the accuracy of the indexation score.

โ†’DBSCAN for outlier detection
โ†’RFM-based segmentation
Temporal decay weighting

Raw indexation scores are weighted by time elapsed since last visit. An account that was visited 2 weeks ago contributes less urgency than one not visited in 3 months, even if the indexation gap is identical.

This prevents over-clustering of visits at already-serviced accounts and ensures the full territory gets systematic coverage over time.

0.3ร—
< 2 weeks since last visit
Recently visited โ€” low urgency
0.7ร—
2โ€“6 weeks since last visit
Normal cadence โ€” moderate weight
1.5ร—
> 6 weeks since last visit
Overdue โ€” boosted priority score
Final visit score โ€” composite of 4 signals
40%
Indexation Delta
Magnitude of the under-index gap
25%
Visit Recency
Temporal decay since last contact
25%
Revenue Potential
Account size and category weight
10%
Geographic cluster
Route efficiency optimisation
Tech Stack

Full Azure
cloud stack

Every layer runs on Microsoft Azure. All components are Dockerised for portability and consistency across dev, staging and production environments.

Orchestration
โšก
Azure Durable Functions
Stateful fan-out / fan-in with automatic retry
โฐ
Azure Timer Trigger
CRON-based weekly schedule, zero infra to manage
Data & Storage
๐Ÿ—„๏ธ
Azure SQL Database
Scoring outputs and rep-facing recommendation tables
โ˜๏ธ
Azure Blob Storage
Intermediate data and pipeline artefacts
ML & Processing
๐Ÿ
Python 3.11
Core pipeline language โ€” data wrangling and scoring
๐Ÿ”ข
Pandas + Scikit-learn
Indexation scoring, K-Means clustering, ranking model
Delivery
๐Ÿ“Š
Streamlit
Interactive dashboard โ€” per-rep lists with filters
๐Ÿณ
Docker
All services containerised โ€” consistent deploys
References

Further reading

Azure Durable Functions
Stateful serverless orchestration โ€” Microsoft Docs
Streamlit
Turn Python scripts into shareable web apps
Scikit-learn โ€” KMeans
K-Means clustering for geographic and SKU grouping
Scikit-learn โ€” DBSCAN
Density-based clustering for POS outlier detection
Azure Container Apps
Run containerised apps on a serverless platform
Nielsen Distribution Analytics
How distribution and availability drive FMCG sales performance
Circana โ€” SKU Optimisation
Industry methodology for portfolio performance and SKU indexing
RFM Customer Segmentation
Recency, Frequency, Monetary โ€” classic B2B scoring model
BotiqueAI
Custom AI and MLOps for enterprise clients
โ† Back to portfolio