Predictive Logistics
We merged transport infrastructure data with shipment outcomes to build an RTO risk prediction model. The model identifies high-risk Tier 3 delivery zones with 89% accuracy, enabling logistics companies to adjust COD limits, pre-collect payments, or optimize hub placement.
Executive Summary
Return-to-Origin (RTO) costs Indian e-commerce and logistics companies ₹2,000-4,000 crores annually. In Tier 3 cities, RTO rates can exceed 25% — 3-4x higher than metros — yet most analytics focus on metro performance.
Our research combines MoRTH road data, Indian Railways freight stats, VAHAN vehicle registration, and 5.2M+ anonymized shipment outcomes to predict RTO risk at the PIN code level.
Key Result: An 89% accurate RTO risk model that enables:
- Dynamic COD limits by zone
- Pre-emptive hub optimization
- Courier partner routing decisions
Methodology
Data Sources
| Source | Data Points | Use Case |
|---|---|---|
| MoRTH Road Statistics | 720 districts | Road density, surface quality, highway connectivity |
| Indian Railways | 8,000+ stations | Freight volume, connectivity to major hubs |
| VAHAN Registration | 5.6M vehicles | Commercial vehicle density per district |
| Shipment Outcomes | 5.2M shipments | RTO rates by origin/destination PIN |
| Postal Index Data | 50,000+ PINs | Geographic clustering of delivery zones |
Risk Model Architecture
┌─────────────────────────────────────────────────────────────┐
│ RTO RISK PREDICTION MODEL │
├─────────────────────────────────────────────────────────────┤
│ Inputs: │
│ ├── Road Score (30%) ← MoRTH road length, quality │
│ ├── Rail Score (25%) ← Railway connectivity, freight │
│ ├── Vehicle Score (20%) ← Commercial vehicle density │
│ ├── Historical RTO (15%)← Prior shipment outcomes │
│ └── Demographic Score (10%)← Population, literacy, etc │
├─────────────────────────────────────────────────────────────┤
│ Output: RTO Risk Score (0-100) │
│ ├── Low Risk (0-30) → Standard COD, regular routing │
│ ├── Medium Risk (31-60) → Reduced COD, preferred courier │
│ └── High Risk (61-100) → Prepay only, hub-based delivery │
└─────────────────────────────────────────────────────────────┘
Key Findings
1. Infrastructure-Outcome Correlation
We identified the strongest predictors of RTO:
| Factor | Correlation | Insight |
|---|---|---|
| Last-mile road quality | 0.72 | Poor internal roads = 2.3x higher RTO |
| Rail connectivity | -0.54 | Rail access = 40% lower RTO |
| Commercial vehicle density | -0.41 | More fleets = better last-mile |
| Distance from hub | 0.38 | >150km = 1.8x higher RTO |
| COD preference | 0.31 | High COD zones need prepay |
2. Tier Classification Risk
| City Tier | Avg RTO Rate | Risk Score | Recommended Action |
|---|---|---|---|
| Metro (Tier 1) | 6.2% | 18/100 | Standard COD up to ₹5,000 |
| Large (Tier 2) | 11.4% | 35/100 | Reduced COD, verify address |
| Emerging (Tier 3) | 18.7% | 58/100 | Prepay preferred, hub routing |
| Rural (Tier 4) | 26.3% | 78/100 | Prepay only, extended timelines |
3. Highest Risk Zones (Tier 3)
The following districts show >25% RTO rates despite reasonable infrastructure:
| District | State | RTO Rate | Primary Issue |
|---|---|---|---|
| Barmer | Rajasthan | 31.2% | Extreme remoteness, sand dunes |
| Upper Dibang Valley | Arunachal | 28.7% | Mountain terrain, limited roads |
| Leh | Ladakh | 27.4% | Seasonal accessibility, altitude |
| Dantewada | Chhattisgarh | 26.8% | Naxal-affected, poor roads |
| Tawang | Arunachal | 25.9% | Mountain passes, monsoons |
4. Economic Impact
For a mid-sized e-commerce player (100K monthly shipments):
| Zone | Shipments/Month | RTO Rate | RTO Cost/Month | Potential Savings |
|---|---|---|---|---|
| Metro | 40,000 | 6.2% | ₹18,600 | Baseline |
| Tier 2 | 35,000 | 11.4% | ₹29,900 | ₹5,200 with model |
| Tier 3 | 20,000 | 18.7% | ₹28,100 | ₹8,400 with model |
| Rural | 5,000 | 26.3% | ₹9,900 | ₹4,500 with model |
| Total | 100K | 11.5% | ₹86,500 | ₹18,100/month |
Use Cases
For E-commerce Companies
- Dynamic COD Limits: Auto-adjust COD eligibility by risk score
- Shipping Partner Routing: Route high-risk zones to experienced partners
- Customer Pre-Qualification: Prompt prepayment for high-RTO zones
For Logistics Providers
- Hub Placement: Identify underserved areas needing micro-hubs
- Fleet Deployment: Position vehicles based on route risk
- Pricing Strategy: Zone-based COD surcharges
For Investors
- Due Diligence: Validate logistics claims against ground truth
- Unit Economics: Understand true delivery costs by zone
- Market Entry: Identify underserved areas for new services
Dataset Schema
{
"pincode": "396230",
"district": "Barmer",
"state": "Rajasthan",
"tier_classification": "Tier 3",
"risk_score": 82,
"risk_category": "High",
"avg_rto_rate": 0.312,
"road_score": 28,
"rail_score": 12,
"vehicle_density_score": 35,
"distance_from_hub_km": 280,
"cod_preference_rate": 0.68,
"recommendation": "Prepay only, extended delivery window",
"last_updated": "2025-12-01"
}
Model Validation
| Metric | Score |
|---|---|
| Precision (High Risk) | 0.87 |
| Recall (High Risk) | 0.82 |
| Overall Accuracy | 0.89 |
| AUC-ROC | 0.92 |
| F1 Score (High Risk) | 0.84 |
Validation Period: October-November 2025 on 50,000 blind test shipments.
Limitations & Caveats
- Data Freshness: Infrastructure data updated annually; monthly risk updates based on shipment patterns
- Seasonal Variation: Monsoon and festival seasons cause temporary risk shifts
- COD Preferences: Changing payment behavior may affect risk model assumptions
- Partner Bias: Shipment data from 3 partners may not represent all courier behaviors
Next Steps
Our Predictive Logistics Intelligence product provides:
- Weekly PIN code risk score updates
- API integration for real-time routing decisions
- Monthly zone performance benchmarking
- Custom model training for specific business profiles
Contact us for API access or custom analysis.
Our Enrichment
Our Enrichment Process
Public data tells us about infrastructure, but not about delivery outcomes. We built a proprietary dataset by:
- Shipment Outcome Collection: Partnered with 3 logistics aggregators to anonymize 5.2M+ shipment records
- Pincode-Level Mapping: Mapped each shipment to 50,000+ PIN codes with delivery outcome rates
- Infrastructure Scoring: Created composite scores combining road density, rail connectivity, and vehicle registration patterns
- Outcome Correlation: Applied ML to identify which infrastructure factors best predict RTO
Result: A PIN-code level RTO risk score updated monthly with 89% prediction accuracy.