📍 Learning Path / Roadmap

Recommended order for working through the playbook. For each topic: read the cheatsheet to build understanding → work through the corresponding drill → self-test with the in-page L1/L2/L3 questions. Links go directly to each page.

1 · Foundations

ml-dl-fundamentals — optimizers / regularization / normalization / backpropagation
math-and-stats — probability / linear algebra (SVD → low-rank) / KL / information theory
Drills: cross-entropy · adamw

2 · LLM Architecture

llm-architecture — Transformer / attention variants / positional encoding / KV cache / MoE
Drills: attention · rope · kv-cache · gqa-mqa · swiglu-ffn · rmsnorm · sampling

3 · PEFT

peft-methods — LoRA / DoRA / rank-spectrum budgeting / merge inference
Drills: lora-forward · dora-forward

4 · Post-training Core

llm-post-training — SFT / RM / RLHF / PPO / DPO full pipeline
reward-modeling-eval — RM training / PRM-ORM / reward hacking / evaluation
eval-and-judges — three evaluation categories / LLM-as-judge / evaluation bias / benchmarks and data contamination
Drills: sft-loss-mask · sequence-packing · dpo-loss · simpo-loss · ppo-clip · gae · reward-margin

5 · Reasoning-RL Frontier

reasoning-rl-frontier — PPO→GRPO→DAPO·Dr.GRPO / RLVR / long-CoT
Drills: grpo · rloo

6 · Systems & Wrap-up

ml-system-design — fine-tuning pipelines / distributed RLHF / evaluation infrastructure
coding-and-algorithms — algorithm problem types + ML implementation problems

7 · Continual / Lifelong

continual-post-training — catastrophic forgetting / replay / model merging / KL · production-validated methods only

8 · Long-horizon / Agentic

long-horizon-agents — computer use / agent engineering / difficulty-graded rewards / self-evolution (production vs. frontier)

Review method: after reviewing each item, mark it ✅ solid / ⚠️ fuzzy / ❌ unknown; then only re-drill ⚠️/❌. Before interviews, use the homepage search bar to jump directly to weak topics.