Home/Roadmaps/Data Scientist
Roadmap · Updated May 2026

The Data Scientist trek

Statistics, SQL, Python, machine learning, deep learning, experimentation, big data, and communication. Everything from raw data to deployed model to executive insight.

Stages
13
Estimated time
7 months
Level
Beginner → Advanced
Maintained by
3 practitioners
01
Stage 01

Statistics & probability foundations

Probability, distributions, hypothesis testing, and the statistical thinking that separates data scientists from people who make charts. This is the hardest and most important stage.

StatisticsProbabilityBeginner
02
Stage 02

SQL for analytics

Window functions, CTEs, recursive queries, and the SQL patterns that let you answer business questions directly from a database without writing a line of Python.

SQLAnalyticsPostgres
03
Stage 03

Python for data science

NumPy, pandas, and Polars. Slicing, joining, reshaping, and writing data pipelines that are fast, readable, and reproducible.

PythonpandasPolarsNumPy
04
Stage 04

Data wrangling & feature engineering

The craft of turning raw data into model-ready features. Encoding, scaling, imputation, feature selection, and handling the real-world messiness of production data.

Feature Engineeringscikit-learnData Prep
05
Stage 05

Machine learning fundamentals

Linear/logistic regression, trees, ensembles, and clustering. Build the intuitions that make complex models make sense — and know when to use them.

scikit-learnMLXGBoostIntermediate
06
Stage 06

Model evaluation & selection

Beyond accuracy: AUC, calibration, fairness, cross-validation, and the rigorous evaluation practices that prevent you from shipping models that hurt users.

EvaluationMetricsFairness
07
Stage 07

Deep learning basics

Neural networks, PyTorch, CNNs, and how to train models that learn representations. The practical skills for when tabular ML isn't enough.

Deep LearningPyTorchNeural Networks
08
Stage 08

NLP & text analytics

Text preprocessing, embeddings, sentiment analysis, topic modeling, and working with transformer models for text tasks.

NLPHuggingFaceTransformers
09
Stage 09

Experimentation & causal inference

Designing experiments that change decisions, avoiding the traps that make most A/B tests misleading, and going beyond correlation to causation.

ExperimentationA/B TestingCausal Inference
10
Stage 10

Big data & distributed computing

When data doesn't fit in memory: Spark, Dask, BigQuery, Snowflake, and the architecture of modern data lakehouses.

SparkBigQueryDaskBig Data
11
Stage 11

Data visualization & storytelling

Matplotlib, seaborn, Plotly, Tableau, and the narrative structure that makes data findings actually change decisions.

VisualizationTableauStorytelling
12
Stage 12

Model deployment & MLOps basics

Serving models in production: FastAPI, Docker, monitoring, drift detection, and the basics of keeping models accurate after they ship.

MLOpsModel ServingDrift Detection
13
Stage 13

Capstone — end-to-end data science project

Research question → data → model → deployment → stakeholder recommendation. A real artifact for your portfolio.

CapstoneAdvancedPortfolio

Trek complete. What's next?

You've walked the full roadmap. Now ship the capstone, write about it, and share the path with the next engineer who needs it.

Read the blogExplore more roadmaps