Cheatsheet · 题解

📍 学习路径 / Roadmap

建议按此顺序过 playbook。每个主题:先读 cheatsheet 理解 → 做对应 drill 手撕 → 用页内 L1/L2/L3 自测。链接直达各页。

1 · 打基础 / Foundations

ml-dl-fundamentals — 优化器 / 正则 / 归一化 / 反向传播
math-and-stats — 概率 / 线代(SVD→低秩)/ KL / 信息论
手撕:cross-entropy · adamw

2 · LLM 架构 / Architecture

llm-architecture — Transformer / attention 变体 / 位置编码 / KV cache / MoE
手撕:attention · rope · kv-cache · gqa-mqa · swiglu-ffn · rmsnorm · sampling

3 · PEFT

peft-methods — LoRA / DoRA / 秩谱预算 / 合并推理
手撕:lora-forward · dora-forward

4 · Post-training 主线 / Core

llm-post-training — SFT / RM / RLHF / PPO / DPO 全流程
reward-modeling-eval — RM 训练 / PRM-ORM / reward hacking / 评测
eval-and-judges — 评测三类 / LLM-as-judge / 评测偏置 / 基准与数据污染
手撕:sft-loss-mask · sequence-packing · dpo-loss · simpo-loss · ppo-clip · gae · reward-margin

5 · 推理-RL 前沿 / Frontier

reasoning-rl-frontier — PPO→GRPO→DAPO·Dr.GRPO / RLVR / long-CoT
手撕:grpo · rloo

6 · 工程 & 收尾 / Systems

ml-system-design — 微调流水线 / RLHF 分布式 / 评测体系
coding-and-algorithms — 算法题型 + ML 实现题

7 · 持续 / 终身 / Continual

continual-post-training — 灾难性遗忘 / replay / 模型合并 / KL · 只收生产验证方法

8 · 长程 / Agentic / Long-horizon

long-horizon-agents — computer use / agent 工程 / 难度带奖励 / 自进化(生产 vs 前沿)

复习法:每题复习后标 ✅ 熟练 / ⚠️ 模糊 / ❌ 不会;之后只重刷 ⚠️/❌。面试前用首页搜索框直接跳到薄弱主题。