Mastering large language models – Part IX: self-attention with PyTorch

In the previous post, we have discussed how attention can be applied to avoid bottlenecks in encoder-decoder architectures. In transformer-based models, attention appears in different flavours, the most important being what is called self-attention – the topic of todays post. Code included. Before getting into coding, let us first describe the attention mechanism presented in…More

Mastering large language models – Part VII: War and Peace

This blog is all about code – we will implement and train an LSTM on Tolstoys War and Peace and use it to generate new text snippets that (well, at least very remotely) resemble pieces from the original novel. All the code can be found here in my GitHub repository for this series. First, let…More

Mastering large language models – Part VI: sampling

Today, we will take a closer look at the process of using a trained LSTM or RNN to actually generate new content, i.e. to predict words. To set the scene, recall that the objective on which we have trained our network is to model the probability for each word in the vocabulary. More precisely, assume…More

Mastering large language models – Part V: LSTM networks

In the last post, we have seen how we can implement and train an RNN on a very simple task – learning how to count. In the example, I have chosen a sequence length of L = 6. It is tempting to play around with this parameter to see what happens if we increase the…More

Mastering large language models – Part IV: learning how to count

In the previous post, we have discussed recurrent neural networks in the context of language processing, but in fact, they can be used to learn any type of data structured as a time series. To make sure that we really understand how this works before proceeding to more complex models, we will spent some time…More

Mastering large language models – Part I: Overview

In the history of AI, progress has always come from several sources – more powerful hardware, more high-quality training data or refined training methods. And sometimes, we have seen a step change triggered by a new and innovative generation of models. Some of you might remember the time when the term Deep Learning was coined,…More

Building an NFT wallet app – implementation

In the previous post in this series, I have presented a simple architecture for an NFT wallet app, written in ReactJS. Today, I will dive a bit deeper into the implementation. The source code for this post can be found here. As most of it is fairly standard ReactJS code, I will not discuss all…More