Recommended order for working through the playbook. For each topic: read the cheatsheet to build understanding → work through the corresponding drill → self-test with the in-page L1/L2/L3 questions. Links go directly to each page.
1 · Foundations
- ml-dl-fundamentals — optimizers / regularization / normalization / backpropagation
- math-and-stats — probability / linear algebra (SVD → low-rank) / KL / information theory
- Drills: cross-entropy · adamw
2 · LLM Architecture
- llm-architecture — Transformer / attention variants / positional encoding / KV cache / MoE
- Drills: attention · rope · kv-cache · gqa-mqa · swiglu-ffn · rmsnorm · sampling
3 · PEFT
- peft-methods — LoRA / DoRA / rank-spectrum budgeting / merge inference
- Drills: lora-forward · dora-forward
4 · Post-training Core
- llm-post-training — SFT / RM / RLHF / PPO / DPO full pipeline
- reward-modeling-eval — RM training / PRM-ORM / reward hacking / evaluation
- eval-and-judges — three evaluation categories / LLM-as-judge / evaluation bias / benchmarks and data contamination
- Drills: sft-loss-mask · sequence-packing · dpo-loss · simpo-loss · ppo-clip · gae · reward-margin
5 · Reasoning-RL Frontier
- reasoning-rl-frontier — PPO→GRPO→DAPO·Dr.GRPO / RLVR / long-CoT
- Drills: grpo · rloo
6 · Systems & Wrap-up
- ml-system-design — fine-tuning pipelines / distributed RLHF / evaluation infrastructure
- coding-and-algorithms — algorithm problem types + ML implementation problems
7 · Continual / Lifelong
- continual-post-training — catastrophic forgetting / replay / model merging / KL · production-validated methods only
8 · Long-horizon / Agentic
- long-horizon-agents — computer use / agent engineering / difficulty-graded rewards / self-evolution (production vs. frontier)
Review method: after reviewing each item, mark it ✅ solid / ⚠️ fuzzy / ❌ unknown; then only re-drill ⚠️/❌. Before interviews, use the homepage search bar to jump directly to weak topics.