12-Month LLM Systems & ML Infrastructure Curriculum

Learning Contract

This roadmap is not designed for passive course completion. It is designed to build durable engineering competence through repeated implementation, debugging, evaluation, documentation, refactoring, and deployment.

Primary objective

Become capable of building, evaluating, serving, observing, securing, and explaining real ML and LLM systems.

Depth rule

For every topic, build the smallest correct implementation first, then benchmark, test, refactor, document, and connect it to production tradeoffs.

AI tool policy

Use Cursor and ChatGPT as tutors, reviewers, and adversarial debuggers. Do not outsource the core learning loop. First struggle, then ask for critique.

Weekly rhythm: 5-7 hours reading and notes, 8-12 hours implementation, 3-4 hours debugging and testing, 2 hours benchmarking or evaluation, and 1-2 hours technical writing. Each Sunday, write an engineering review: what worked, what broke, what you measured, and what you would improve in production.

MacBook/Homebrew Setup

Use a reproducible local toolchain. Every project should be installable, testable, lintable, benchmarkable, and runnable from a clean checkout.

Recommended baseline commands

# System tools
brew update
brew install git gh uv ruff jq htop tree wget curl cmake pkg-config

# Infrastructure tools
brew install docker colima kubectl kind k9s postgresql redis sqlite

# Optional productivity tools
brew install just pre-commit graphviz duckdb

# Python baseline
uv python install 3.12
mkdir -p ~/code/ml-systems-roadmap
cd ~/code/ml-systems-roadmap
uv init
uv add pytest hypothesis mypy ruff ipykernel numpy pandas polars duckdb scikit-learn matplotlib rich typer pydantic
uv run python -m ipykernel install --user --name ml-systems-roadmap

# Quality checks used throughout the curriculum
uv run ruff check .
uv run ruff format .
uv run pytest
uv run mypy .

Repo standard

README.md, pyproject.toml, src/ layout, tests/, Makefile or justfile, linting, typing, and reproducible commands.

Notebook standard

Use notebooks for exploration only. Convert final work into modules, scripts, tests, and command-line entry points.

Evidence standard

Keep benchmark logs, plots, screenshots, model cards, experiment notes, ADRs, and deployment instructions.

Phase Map

The curriculum moves from foundations to production systems. Each phase includes depth loops where earlier work is revisited and improved with stronger tools.

Weeks 1-8 Python, Math, Data Foundations Weeks 9-18 Classical ML & Deep Learning Weeks 19-30 LLM Internals & Applications Weeks 31-43 Infrastructure, Serving & Systems Weeks 44-52 Portfolio Specialization & Capstone

Weekly Plan

Search by topic, resource, tool, deliverable, or phase. Each week now includes fuller human-readable bullets, a short guidance note, and direct source links.

Core Resource Library

These are the free and open resources referenced inside the roadmap. Prefer official documentation, university courses, primary papers, and maintained project docs before blog posts.

Assessment Rubric

Use this rubric every month. The goal is to become the kind of engineer who can implement, profile, deploy, debug, and defend technical decisions.

Dimension	Meets Standard	Exceeds Standard
Correctness	Code works on expected inputs and includes normal-case tests.	Includes edge cases, failure modes, property tests, and explicit error handling.
Engineering quality	Project is readable, typed, linted, and installable.	Project has clean architecture, docs, CI, configuration, logging, and benchmark scripts.
ML understanding	Can explain model choices, metrics, and training behavior.	Can diagnose bias/variance, optimization instability, leakage, data issues, and evaluation weaknesses.
Systems thinking	Can run a model or service locally and describe basic performance.	Can profile bottlenecks, estimate cost, design scaling paths, and reason about reliability.
Security and privacy	Handles secrets properly and avoids obvious unsafe patterns.	Includes threat modeling, abuse cases, dependency scanning, prompt-injection tests, and privacy notes.
Communication	README explains setup and usage.	Includes architecture diagrams, tradeoff analysis, benchmark results, and a concise technical article.

Final Capstone Definition

The final project should demonstrate LLM systems competence end to end: data, model, serving, evaluation, observability, reliability, and security.

Recommended capstone

Build a production-shaped LLM application: retrieval-augmented tutor, ASVAB study coach, paper-trading education assistant, code review bot, local/private document QA system, or evaluation platform. It should run locally and optionally deploy to Docker Compose or Kubernetes.

capstone/
├── app/                  # FastAPI or service layer
├── training/             # fine-tuning or evaluation scripts
├── retrieval/            # indexing, chunking, embeddings, vector DB
├── evals/                # test sets, judge prompts, metrics
├── infra/                # Docker, compose, k8s manifests
├── observability/        # logs, traces, dashboards
├── security/             # threat model, prompt-injection tests
├── tests/                # unit, integration, regression tests
└── docs/                 # report, model card, architecture notes

Required final evidence

Public GitHub repository with clean setup instructions.
Working demo path: local command, Docker Compose, or Kubernetes deployment.
Technical report explaining architecture, tradeoffs, risks, benchmarks, and future work.
Evaluation suite with at least 50 representative test cases.
Security review covering prompt injection, data leakage, secrets, dependency risk, and abuse cases.
Performance report with latency, throughput, memory, and cost estimates.
Short portfolio article written for a hiring manager or senior engineer audience.