🥃 Pernod Ricard☁️ AzureMLOps · Data Science

Case Study

DStar

Sales Visit Optimization Engine: 30 AI-ranked recommendations, every Monday morning.

A fully automated MLOps pipeline that analyses sales transactions, detects under-indexed SKUs across all points of sale, and delivers a prioritised visit list to every sales representative before the week begins.

visits recommended

per rep, per week

1×

automated trigger

every Monday 6:00 AM

<2 min

pipeline runtime

trigger to dashboard

100%

dockerised

on Azure Container Apps

Client

Pernod Ricard

Global spirits leader

Project

DStar

Sales intelligence tool

Cloud

Azure

Full managed stack

Language

Python 3.11

Pandas · Scikit-learn

End users

Sales force

Weekly active

Cadence

Weekly

Automated pipeline

The Challenge

Sales reps had data.
Not the right signal.

Pernod Ricard's sales force manages hundreds of points of sale. Each representative had access to raw sales figures, but no structured way to prioritise accounts or identify which SKUs to push.

High-value opportunities were missed, visits were driven by habit, and under-performing SKUs went undetected until quarterly reviews, too late to act.

No visit prioritisation

Reps relied on intuition. No data-driven ranking of which outlets needed attention most.

Hidden SKU distribution gaps

Under-indexed products were invisible in the data. No automated detection across the portfolio.

Quarterly blind spots

By the time reports surfaced insights, the selling window had already passed.

Siloed data sources

CRM, ERP and transaction data lived in separate systems, with no unified field view.

The Solution

A fully automated
weekly intelligence pipeline.

DStar runs autonomously every Monday morning. It ingests CRM, ERP and POS transaction data, computes indexation scores for every SKU at every outlet, and delivers a ranked list of 30 priority visits per rep before the working week begins.

Automatic weekly trigger

Azure Timer fires every Monday 6:00 AM, with zero human intervention required.

Parallel data ingestion

Durable Orchestrator pulls CRM, transactions and SKU inventory in parallel fan-out.

Indexation score computation

Python engine benchmarks each SKU at each POS against regional and national averages.

Composite ranking model

Combines indexation delta, visit recency, revenue potential and geographic clustering.

Dashboard delivery

Results pushed to Streamlit; reps log in and see their personalised visit list.

Weekly

Cadence

vs quarterly reviews

Prioritised visits

per rep, per week

100%

Data-driven

no intuition bias

<2 min

Runtime

trigger to delivery

Architecture

Pipeline
diagram

All services run in Docker containers on Azure. The Durable Orchestrator manages parallelism, retries and state, ensuring reliable execution every week without supervision.

Trigger & Orchestration

Data Sources & Output

ML Processing Layer

Ranking Engine

Drag nodes · Animated edges show data flow direction

The Algorithm

Under-indexed vs over-indexed SKUs

A product's sales at an outlet only make sense relative to its potential. DStar computes an indexation score, then layers in clustering and temporal signals to build a composite visit priority.

Observed

SKU_sales(POS)

Actual volume at outlet

Expected

benchmark(region)

Regional / national avg

Score

index_score

Priority signal

Score < 1.0

Under-indexed

The product sells below its regional benchmark at this outlet. There is untapped potential, and a sales rep visit is likely to unlock incremental volume.

·Distribution gap detected

·Competitor shelf share likely higher

·High visit ROI expected

Score > 1.0

Over-indexed

The product already outperforms its benchmark. This outlet is saturated, so the rep's time is more valuable at an under-indexed account.

·SKU already at full penetration

·Visit deprioritised this cycle

·Filtered out of Top 30 list

Clustering techniques used

📍

Geographic clustering (K-Means)

Points of sale are clustered by GPS coordinates using K-Means. This ensures the Top 30 recommendations form geographically efficient routes, minimising travel time between visits.

→Scikit-learn KMeans

→Haversine distance metric

🧩

SKU similarity clustering

SKUs are grouped by category, price tier and sales seasonality profile. Indexation benchmarks are computed within clusters, avoiding unfair comparisons between products with fundamentally different demand patterns.

→Hierarchical clustering

→Cosine similarity on SKU vectors

🏪

POS segmentation

Outlets are segmented by channel (on-trade vs off-trade), volume tier and historical purchase behaviour. Each segment uses its own benchmark distribution, improving the accuracy of the indexation score.

→DBSCAN for outlier detection

→RFM-based segmentation

Temporal decay weighting

Raw indexation scores are weighted by time elapsed since last visit. An account that was visited 2 weeks ago contributes less urgency than one not visited in 3 months, even if the indexation gap is identical.

This prevents over-clustering of visits at already-serviced accounts and ensures the full territory gets systematic coverage over time.

0.3×

< 2 weeks since last visit

Recently visited, low urgency

0.7×

2–6 weeks since last visit

Normal cadence, moderate weight

1.5×

> 6 weeks since last visit

Overdue, boosted priority score

Final visit score: composite of 4 signals

40%

Indexation Delta

Magnitude of the under-index gap

25%

Visit Recency

Temporal decay since last contact

25%

Revenue Potential

Account size and category weight

10%

Geographic cluster

Route efficiency optimisation

Tech Stack

Full Azure
cloud stack

Every layer runs on Microsoft Azure. All components are Dockerised for portability and consistency across dev, staging and production environments.

Orchestration

⚡

Azure Durable Functions

Stateful fan-out / fan-in with automatic retry

⏰

Azure Timer Trigger

CRON-based weekly schedule, zero infra to manage

Data & Storage

🗄️

Azure SQL Database

Scoring outputs and rep-facing recommendation tables

☁️

Azure Blob Storage

Intermediate data and pipeline artefacts

ML & Processing

🐍

Python 3.11

Core pipeline language: data wrangling and scoring

🔢

Pandas + Scikit-learn

Indexation scoring, K-Means clustering, ranking model

Delivery

📊

Streamlit

Interactive dashboard with per-rep lists and filters

🐳

Docker

All services containerised for consistent deploys

References

Sales reps had data.Not the right signal.

A fully automatedweekly intelligence pipeline.

Pipelinediagram