Depth-maximized portfolio curriculum

12-Month LLM Systems & ML Infrastructure Roadmap

A rigorous, free/open-source, week-by-week curriculum for becoming competitive in LLM systems, machine learning infrastructure, GPU-aware training, data pipelines, model serving, observability, security, and production-grade engineering. This revised version keeps the same visual style while making the weekly guidance more conversational, detailed, and easier to act on.

52 weeks Structured progression from Python refresh to a final production-shaped capstone.
18-25 hrs/week Depth-first schedule with implementation, reading, profiling, and writing.
5 phases Foundations, ML/DL, LLM internals, infrastructure, specialization.
74+ sources Direct links to official docs, university courses, primary papers, and practical project references.

Learning Contract

This roadmap is not designed for passive course completion. It is designed to build durable engineering competence through repeated implementation, debugging, evaluation, documentation, refactoring, and deployment.

Primary objective

Become capable of building, evaluating, serving, observing, securing, and explaining real ML and LLM systems.

Depth rule

For every topic, build the smallest correct implementation first, then benchmark, test, refactor, document, and connect it to production tradeoffs.

AI tool policy

Use Cursor and ChatGPT as tutors, reviewers, and adversarial debuggers. Do not outsource the core learning loop. First struggle, then ask for critique.

Weekly rhythm: 5-7 hours reading and notes, 8-12 hours implementation, 3-4 hours debugging and testing, 2 hours benchmarking or evaluation, and 1-2 hours technical writing. Each Sunday, write an engineering review: what worked, what broke, what you measured, and what you would improve in production.

MacBook/Homebrew Setup

Use a reproducible local toolchain. Every project should be installable, testable, lintable, benchmarkable, and runnable from a clean checkout.

Recommended baseline commands

# System tools
brew update
brew install git gh uv ruff jq htop tree wget curl cmake pkg-config

# Infrastructure tools
brew install docker colima kubectl kind k9s postgresql redis sqlite

# Optional productivity tools
brew install just pre-commit graphviz duckdb

# Python baseline
uv python install 3.12
mkdir -p ~/code/ml-systems-roadmap
cd ~/code/ml-systems-roadmap
uv init
uv add pytest hypothesis mypy ruff ipykernel numpy pandas polars duckdb scikit-learn matplotlib rich typer pydantic
uv run python -m ipykernel install --user --name ml-systems-roadmap

# Quality checks used throughout the curriculum
uv run ruff check .
uv run ruff format .
uv run pytest
uv run mypy .

Repo standard

README.md, pyproject.toml, src/ layout, tests/, Makefile or justfile, linting, typing, and reproducible commands.

Notebook standard

Use notebooks for exploration only. Convert final work into modules, scripts, tests, and command-line entry points.

Evidence standard

Keep benchmark logs, plots, screenshots, model cards, experiment notes, ADRs, and deployment instructions.

Phase Map

The curriculum moves from foundations to production systems. Each phase includes depth loops where earlier work is revisited and improved with stronger tools.

Weekly Plan

Search by topic, resource, tool, deliverable, or phase. Each week now includes fuller human-readable bullets, a short guidance note, and direct source links.

Core Resource Library

These are the free and open resources referenced inside the roadmap. Prefer official documentation, university courses, primary papers, and maintained project docs before blog posts.

Assessment Rubric

Use this rubric every month. The goal is to become the kind of engineer who can implement, profile, deploy, debug, and defend technical decisions.

Dimension Meets Standard Exceeds Standard
Correctness Code works on expected inputs and includes normal-case tests. Includes edge cases, failure modes, property tests, and explicit error handling.
Engineering quality Project is readable, typed, linted, and installable. Project has clean architecture, docs, CI, configuration, logging, and benchmark scripts.
ML understanding Can explain model choices, metrics, and training behavior. Can diagnose bias/variance, optimization instability, leakage, data issues, and evaluation weaknesses.
Systems thinking Can run a model or service locally and describe basic performance. Can profile bottlenecks, estimate cost, design scaling paths, and reason about reliability.
Security and privacy Handles secrets properly and avoids obvious unsafe patterns. Includes threat modeling, abuse cases, dependency scanning, prompt-injection tests, and privacy notes.
Communication README explains setup and usage. Includes architecture diagrams, tradeoff analysis, benchmark results, and a concise technical article.

Final Capstone Definition

The final project should demonstrate LLM systems competence end to end: data, model, serving, evaluation, observability, reliability, and security.

Recommended capstone

Build a production-shaped LLM application: retrieval-augmented tutor, ASVAB study coach, paper-trading education assistant, code review bot, local/private document QA system, or evaluation platform. It should run locally and optionally deploy to Docker Compose or Kubernetes.

capstone/
├── app/                  # FastAPI or service layer
├── training/             # fine-tuning or evaluation scripts
├── retrieval/            # indexing, chunking, embeddings, vector DB
├── evals/                # test sets, judge prompts, metrics
├── infra/                # Docker, compose, k8s manifests
├── observability/        # logs, traces, dashboards
├── security/             # threat model, prompt-injection tests
├── tests/                # unit, integration, regression tests
└── docs/                 # report, model card, architecture notes

Required final evidence

  • Public GitHub repository with clean setup instructions.
  • Working demo path: local command, Docker Compose, or Kubernetes deployment.
  • Technical report explaining architecture, tradeoffs, risks, benchmarks, and future work.
  • Evaluation suite with at least 50 representative test cases.
  • Security review covering prompt injection, data leakage, secrets, dependency risk, and abuse cases.
  • Performance report with latency, throughput, memory, and cost estimates.
  • Short portfolio article written for a hiring manager or senior engineer audience.