The penultimate chapter teaches us how to fine-tune our LLM for the classic "spam" or "not spam" task. I did not realise how litte was involved and don't really understand how it works, but it does!
In the 5th chapter we train our model. So we need a way to figure out how "wrong" (or how far from the desired) the output is, which is where loss functions come in. We also look at temperature and top-k sampling so the output isn't always the most likely next token. Finally, we load OpenAI's weights into our code!
In the fourth chapter, we combine everything we've learned so far and create a GPT model that is capable of generating text. The quality may be questionable, but it is definitely capable of generating more text!
The third chapter teaches how to code attention mechanisms. The LLM breakthrough. We start with a simple version with non-trainable weights and make adjustments until we have multi-headed attention as used in GPT-2.