Jun 28, 2026
Muon Optimizer: SGD vs AdamW vs Matrix-Aware Training Updates
Optimizer research note This post continues the optimizer path from SGD and AdamW into Muon, a matrix-aware training update that changes what the optimizer kernel has to do. Previously in th...
Read post →