Pernod Ricard
BotiqueAI
🥃 Pernod Ricard☁️ AzureMLOps · Data Science
Case Study

DStar

Sales Visit Optimization Engine — 30 AI-ranked recommendations, every Monday morning.

A fully automated MLOps pipeline that analyses sales transactions, detects under-indexed SKUs across all points of sale, and delivers a prioritised visit list to every sales representative before the week begins.

30
visits recommended
per rep, per week
automated trigger
every Monday 6:00 AM
<2 min
pipeline runtime
trigger to dashboard
100%
dockerised
on Azure Container Apps
Client
Pernod Ricard
Global spirits leader
Project
DStar
Sales intelligence tool
Cloud
Azure
Full managed stack
Language
Python 3.11
Pandas · Scikit-learn
End users
Sales force
Weekly active
Cadence
Weekly
Automated pipeline
The Challenge

Sales reps had data.
Not the right signal.

Pernod Ricard's sales force manages hundreds of points of sale. Each representative had access to raw sales figures — but no structured way to prioritise accounts or identify which SKUs to push.

High-value opportunities were missed, visits were driven by habit, and under-performing SKUs went undetected until quarterly reviews — too late to act.

No visit prioritisation
Reps relied on intuition. No data-driven ranking of which outlets needed attention most.
Hidden SKU distribution gaps
Under-indexed products were invisible in the data. No automated detection across the portfolio.
Quarterly blind spots
By the time reports surfaced insights, the selling window had already passed.
Siloed data sources
CRM, ERP and transaction data lived in separate systems — no unified field view.
The Solution

A fully automated
weekly intelligence pipeline.

DStar runs autonomously every Monday morning. It ingests CRM, ERP and POS transaction data, computes indexation scores for every SKU at every outlet, and delivers a ranked list of 30 priority visits per rep before the working week begins.

01
Automatic weekly trigger
Azure Timer fires every Monday 6:00 AM — zero human intervention required.
02
Parallel data ingestion
Durable Orchestrator pulls CRM, transactions and SKU inventory in parallel fan-out.
03
Indexation score computation
Python engine benchmarks each SKU at each POS against regional and national averages.
04
Composite ranking model
Combines indexation delta, visit recency, revenue potential and geographic clustering.
05
Dashboard delivery
Results pushed to Streamlit — reps log in and see their personalised visit list.
Weekly
Cadence
vs quarterly reviews
30
Prioritised visits
per rep, per week
100%
Data-driven
no intuition bias
<2 min
Runtime
trigger to delivery
Architecture

Pipeline
diagram

All services run in Docker containers on Azure. The Durable Orchestrator manages parallelism, retries and state — reliable execution every week without supervision.

Trigger & Orchestration
Data Sources & Output
ML Processing Layer
Ranking Engine

Drag nodes · Animated edges show data flow direction

The Algorithm

Under-indexed vs over-indexed SKUs

A product's sales at an outlet only make sense relative to its potential. DStar computes an indexation score, then layers in clustering and temporal signals to build a composite visit priority.

Observed
SKU_sales(POS)
Actual volume at outlet
÷
Expected
benchmark(region)
Regional / national avg
=
Score
index_score
Priority signal
Score < 1.0

Under-indexed

The product sells below its regional benchmark at this outlet. There is untapped potential — a sales rep visit is likely to unlock incremental volume.

·Distribution gap detected
·Competitor shelf share likely higher
·High visit ROI expected
Score > 1.0

Over-indexed

The product already outperforms its benchmark. This outlet is saturated — the rep's time is more valuable at an under-indexed account.

·SKU already at full penetration
·Visit deprioritised this cycle
·Filtered out of Top 30 list
Clustering techniques used
📍
Geographic clustering (K-Means)

Points of sale are clustered by GPS coordinates using K-Means. This ensures the Top 30 recommendations form geographically efficient routes — minimising travel time between visits.

Scikit-learn KMeans
Haversine distance metric
🧩
SKU similarity clustering

SKUs are grouped by category, price tier and sales seasonality profile. Indexation benchmarks are computed within clusters, avoiding unfair comparisons between products with fundamentally different demand patterns.

Hierarchical clustering
Cosine similarity on SKU vectors
🏪
POS segmentation

Outlets are segmented by channel (on-trade vs off-trade), volume tier and historical purchase behaviour. Each segment uses its own benchmark distribution, improving the accuracy of the indexation score.

DBSCAN for outlier detection
RFM-based segmentation
Temporal decay weighting

Raw indexation scores are weighted by time elapsed since last visit. An account that was visited 2 weeks ago contributes less urgency than one not visited in 3 months, even if the indexation gap is identical.

This prevents over-clustering of visits at already-serviced accounts and ensures the full territory gets systematic coverage over time.

0.3×
< 2 weeks since last visit
Recently visited — low urgency
0.7×
2–6 weeks since last visit
Normal cadence — moderate weight
1.5×
> 6 weeks since last visit
Overdue — boosted priority score
Final visit score — composite of 4 signals
40%
Indexation Delta
Magnitude of the under-index gap
25%
Visit Recency
Temporal decay since last contact
25%
Revenue Potential
Account size and category weight
10%
Geographic cluster
Route efficiency optimisation
Tech Stack

Full Azure
cloud stack

Every layer runs on Microsoft Azure. All components are Dockerised for portability and consistency across dev, staging and production environments.

Orchestration
Azure Durable Functions
Stateful fan-out / fan-in with automatic retry
Azure Timer Trigger
CRON-based weekly schedule, zero infra to manage
Data & Storage
🗄️
Azure SQL Database
Scoring outputs and rep-facing recommendation tables
☁️
Azure Blob Storage
Intermediate data and pipeline artefacts
ML & Processing
🐍
Python 3.11
Core pipeline language — data wrangling and scoring
🔢
Pandas + Scikit-learn
Indexation scoring, K-Means clustering, ranking model
Delivery
📊
Streamlit
Interactive dashboard — per-rep lists with filters
🐳
Docker
All services containerised — consistent deploys
References

Further reading

Azure Durable Functions
Stateful serverless orchestration — Microsoft Docs
Streamlit
Turn Python scripts into shareable web apps
Scikit-learn — KMeans
K-Means clustering for geographic and SKU grouping
Scikit-learn — DBSCAN
Density-based clustering for POS outlier detection
Azure Container Apps
Run containerised apps on a serverless platform
Nielsen Distribution Analytics
How distribution and availability drive FMCG sales performance
Circana — SKU Optimisation
Industry methodology for portfolio performance and SKU indexing
RFM Customer Segmentation
Recency, Frequency, Monetary — classic B2B scoring model
BotiqueAI
Custom AI and MLOps for enterprise clients
← Back to portfolio