← All posts

Tagged With

adamw

1 post connected to this tag.

Optimizers: From SGD to AdamW

Jun 15, 2026

Optimizers: From SGD to AdamW

Lab note Part of the ShivasNotes transformer-from-scratch series. Previously: dL/d(LLM): The Full Backward Pass. The full backward-pass post ended with every weight in the model holding a gr...

Read post →

Subscribe

Get my rants delivered to your inbox

I will send new posts as and when I write. No fixed cadence, just engineering notes, rants, and things I am thinking through.