Learning Contract
This roadmap is not designed for passive course completion. It is designed to build durable engineering competence through repeated implementation, debugging, evaluation, documentation, refactoring, and deployment.
Primary objective
Become capable of building, evaluating, serving, observing, securing, and explaining real ML and LLM systems.
Depth rule
For every topic, build the smallest correct implementation first, then benchmark, test, refactor, document, and connect it to production tradeoffs.
AI tool policy
Use Cursor and ChatGPT as tutors, reviewers, and adversarial debuggers. Do not outsource the core learning loop. First struggle, then ask for critique.
MacBook/Homebrew Setup
Use a reproducible local toolchain. Every project should be installable, testable, lintable, benchmarkable, and runnable from a clean checkout.
Recommended baseline commands
# System tools
brew update
brew install git gh uv ruff jq htop tree wget curl cmake pkg-config
# Infrastructure tools
brew install docker colima kubectl kind k9s postgresql redis sqlite
# Optional productivity tools
brew install just pre-commit graphviz duckdb
# Python baseline
uv python install 3.12
mkdir -p ~/code/ml-systems-roadmap
cd ~/code/ml-systems-roadmap
uv init
uv add pytest hypothesis mypy ruff ipykernel numpy pandas polars duckdb scikit-learn matplotlib rich typer pydantic
uv run python -m ipykernel install --user --name ml-systems-roadmap
# Quality checks used throughout the curriculum
uv run ruff check .
uv run ruff format .
uv run pytest
uv run mypy .
Repo standard
README.md, pyproject.toml, src/ layout, tests/, Makefile or justfile, linting, typing, and reproducible commands.
Notebook standard
Use notebooks for exploration only. Convert final work into modules, scripts, tests, and command-line entry points.
Evidence standard
Keep benchmark logs, plots, screenshots, model cards, experiment notes, ADRs, and deployment instructions.
Phase Map
The curriculum moves from foundations to production systems. Each phase includes depth loops where earlier work is revisited and improved with stronger tools.
Weekly Plan
Search by topic, resource, tool, deliverable, or phase. Each week now includes fuller human-readable bullets, a short guidance note, and direct source links.
Core Resource Library
These are the free and open resources referenced inside the roadmap. Prefer official documentation, university courses, primary papers, and maintained project docs before blog posts.
Assessment Rubric
Use this rubric every month. The goal is to become the kind of engineer who can implement, profile, deploy, debug, and defend technical decisions.
| Dimension | Meets Standard | Exceeds Standard |
|---|---|---|
| Correctness | Code works on expected inputs and includes normal-case tests. | Includes edge cases, failure modes, property tests, and explicit error handling. |
| Engineering quality | Project is readable, typed, linted, and installable. | Project has clean architecture, docs, CI, configuration, logging, and benchmark scripts. |
| ML understanding | Can explain model choices, metrics, and training behavior. | Can diagnose bias/variance, optimization instability, leakage, data issues, and evaluation weaknesses. |
| Systems thinking | Can run a model or service locally and describe basic performance. | Can profile bottlenecks, estimate cost, design scaling paths, and reason about reliability. |
| Security and privacy | Handles secrets properly and avoids obvious unsafe patterns. | Includes threat modeling, abuse cases, dependency scanning, prompt-injection tests, and privacy notes. |
| Communication | README explains setup and usage. | Includes architecture diagrams, tradeoff analysis, benchmark results, and a concise technical article. |
Final Capstone Definition
The final project should demonstrate LLM systems competence end to end: data, model, serving, evaluation, observability, reliability, and security.
Recommended capstone
Build a production-shaped LLM application: retrieval-augmented tutor, ASVAB study coach, paper-trading education assistant, code review bot, local/private document QA system, or evaluation platform. It should run locally and optionally deploy to Docker Compose or Kubernetes.
capstone/
├── app/ # FastAPI or service layer
├── training/ # fine-tuning or evaluation scripts
├── retrieval/ # indexing, chunking, embeddings, vector DB
├── evals/ # test sets, judge prompts, metrics
├── infra/ # Docker, compose, k8s manifests
├── observability/ # logs, traces, dashboards
├── security/ # threat model, prompt-injection tests
├── tests/ # unit, integration, regression tests
└── docs/ # report, model card, architecture notes
Required final evidence
- Public GitHub repository with clean setup instructions.
- Working demo path: local command, Docker Compose, or Kubernetes deployment.
- Technical report explaining architecture, tradeoffs, risks, benchmarks, and future work.
- Evaluation suite with at least 50 representative test cases.
- Security review covering prompt injection, data leakage, secrets, dependency risk, and abuse cases.
- Performance report with latency, throughput, memory, and cost estimates.
- Short portfolio article written for a hiring manager or senior engineer audience.