The third chapter teaches how to code attention mechanisms. The LLM breakthrough. We start with a simple version with non-trainable weights and make adjustments until we have multi-headed attention as used in GPT-2.
I wanted to run a docker image of a little app I made on my Synolgy NAS without the extra effort of publishing the image to a registry. Here's how I did it.