Mastering large language models Part XV: building a chat-bot with DialoGPT and Streamlit

Arguably, a large part of the success of models like GPT is the fact that they have been equipped with a chat frontend and proved able to have a dialogue with a user which is perceived as comparable to a dialogue you would have with another human. Today, we will see how transformer based language models can be employed for that purpose.

In the previous post, we have seen how we can easily download and run pretrained transformer models from the Huggingface hub. If you play with my notebook on this a bit, you will soon find that they do exactly what we expect them to do – find the most likely completion. If, for instance, you use the promt “Hello”, followed by the end-of-sentence token tokenizer.eos_token, they will give you a piece of text starting with “Hello”. This could be, for instance, a question that someone has posted into a Stackexchange forum.

If you want to use a large language model to build a bot that can actually have a conversation, this is usually not what you want. Instead, you want a completion that is a reply to the prompt. There are different ways to achieve this. One obvious approach that has been taken by Microsoft for their DialoGPT model is to already train the model (either from scratch or via fine-tuning) on a dataset of dialogues. In the case of DialoGPT, the dataset has been obtained by scraping from Reddit.

A text-based chat-bot

Before discussing how to use this model to implement a chat-bot, let us quickly fix a few terms. A conversation between a user and a chat-bot is structured in units called turns. A turn is a pair of a user prompt and a response from the bot. DialoGPT has been trained on a dataset of conversations, modelled as a sequence of turns. Each turn is a sequence of token, where the user prompt and the reply of the bot are separated by an end-of-sentence (EOS) token. Thus a typical (very short) dialogue with two turns could be represented as follows

Good morning!<eos>Hi<eos>How do you feel today?<eos>I feel great<eos>

Here the user first enters the prompt “Good morning!” to which the bot replies “Hi”. In the second turn, the user enters “How do you feel today” and the bot replies “I feel great”.

Now the architecture of DialoGPT is identical to that of GPT-2, so we can use what we have learned about this class of models (decoder-only models) to sample from it. However, we want the model to take the entire conversation that took place so far into account when creating a response, so we feed the entire history, represented as a sequence of token as above, into the model when creating a bot response. Assuming that we have a function generate that accepts a prompt and generates a reply until it reaches and end-of-sentence token, the pseudocode for a chat bot would therefore look as follows.

input_ids = []
while True:
    prompt = input("User:")
    input_ids.extend(tokenizer.decode(prompt))
    input_ids.append(tokenizer.eos_token)
    response = generate(input_ids)
    print(f"Bot: {tokenize.decode(response)}"
    input_ids.extend(response)

Building a terminal-based chat-bot on top of DialoGPT is now very easy. The only point that we need to keep in mind is that in practice, we will want to cache past keys and values as discussed in the last post. As we need the values again in the next turn which will again be based on the entire history, our generate function should return these values and we will have to pass them back to the function in the next turn. An implementation of generate along those lines can be found here in my repository. To run a simple text-based chatbot, simply do (you probably want to run this in a virtual Python environment):

git clone https://github.com/christianb93/MLLM
cd MLLM
pip3 install -r requirements.txt
cd chatbot
python3 chat_terminal.py

The first time you run this, the DialoGPT model will be downloaded from the hub, so this might take a few minutes. To select a different model, you can use the switch --model (use --show_models to see a list of valid values).

Using Streamlit to build a web-based chat-bot

Let us now try to build the same thing with a web-based interface. To do this, we will use Streamlit, which is a platform designed to easily create data-centered web applications in Python.

The idea behind Streamlit is rather simple. When building a web-based applicaton with Streamlit, you put together an ordinary Python script. This script contains calls into the Streamlit library that you use to define widgets. When you run the Streamlit server, it will load the Python script, execute it and render an HTML page accordingly. To see this in action, create a file called test.py in your working directory containing the following code.

import streamlit as st 


value = st.slider(label = "Entropy coefficient", 
          min_value = 0.0, 
          max_value = 1.0)
print(f"Slider value: {value}")

Then install streamlit and run the applications using

pip3 install streamlit==1.23.1 
streamlit run test.py

Now a browser window should open that displays a slider, allowing you to select a parameter between 0 and 1.

In the terminal session in which you have started the Streamlit server, you should also see a line of output telling you that the value of the slider is zero. What has happened is that Streamlit has started an HTTP server listening on port 8501, processed our script, rendered the slider that we have requested, returned its current value and then entered an internal event loop outside of the script that we have provided.

If now change the position of the slider, you will see a second line printed to the terminal containing the new value of the slider. In fact, if a user interacts with any of the widgets on the screen, Streamlit will simply run the script once more, this time returning the updated value of the slider. So behind the scenes, the server is sitting in an event loop, and whenever an event occurs, it will not – like frontend frameworks a la React will do it – use callbacks to allow the application to update specific widgets, but instead run the entire script again. I recommend to have a short look at the section “Data flow” in the Streamlit documentation which explains this in a bit more detail.

This is straightforward and good enough for many use cases, but for a chat bot, this creates a few challenges. First, we will have to create our model at some point in the script. For larger models, loading the model from disk into memory will take some time. If we do this again every time a user interacts with the frontend, our app will be extremely sluggish.

To avoid this, Streamlit supports caching of function values by annotating a function with the decorator st.cache_resource. Streamlit will then check the arguments of the function against the cached values. If the arguments match a previous call, it will directly return a reference to the cached response without actually invoking the function. Note that here a reference is returned, meaning in our case that all sessions share the same model.

The next challenge that we have is that our application requires state, for instance the chat history or the cached past keys and values, which we have to preserve for the entire session. To maintain session-state, Streamlit offers a dictionary-like object st.session_state. When we run the script for the first time, we can initialize the chat history and other state components by simply adding a respective key to the session state.

st.session_state['input_ids'] = []

For subsequent script iterations, i.e. re-renderings of the screen, that are part of the same session, the Streamlit framework will then make sure that we can access the previously stored value. Internally, Streamlit is also using the session state to store the state of widgets, like the position of a slider or the context of a text input field. When creating such a widget, we can provide the extra parameter key which allows us to specify the key under which the widget state is stored in the session state.

A third feature of Streamlit that turns out to be useful is to use conventional callback functions associated with a widget. This makes it much easier to trigger specific actions when a specific widget changes state, for instance if the user submits a prompt, instead of re-running all actions in the script every time the screen needs to be rendered. We can, for instance, define a widget that will hold the next prompt and, if the user hits “Return” to submit the prompt, invoke the model within the callback. Inside the callback function, we can also update the history stored in the session state.

def process_prompt():
    #
    # Get widget state to access current prompt
    #
    prompt = st.session_state.prompt
    #
    # Update input_ids in session state
    #
    st.session_state.input_ids.extend(tokenizer.encode(prompt))
    st.session_state.input_ids.append(tokenizer.eos_token_id)
    #
    # Invoke model
    #
    generated, past_key_values, _ = utils.generate(model = model, 
                                            tokenizer = tokenizer, 
                                            input_ids = st.session_state.input_ids, 
                                            past_key_values = st.session_state.past_key_values, 
                                            temperature = st.session_state.temperature, debug = False)
    response = tokenizer.decode(generated).replace('<|endoftext|>', '')
    #
    # Prepare next turn
    #
    st.session_state.input_ids.extend(generated)

Armed with this understanding of Streamlit, it is not difficult to put together a simple web-based chatbot using DialoGPT. Here is a screenshot of the bot in action.

You can find the code behind this here. To run this, open a terminal and type (switching to a virtual Python environment if required):

git clone https://github.com/christianb93/MLLM
cd MLLM
pip3 install -r requirements.txt
cd chatbot
streamlit run chat.py

A word of caution: by default, this will expose port 8501 on which Streamlit is listening on the local network (not only on localhost) – it is probably possible to change this somewhere in the Streamlit configuration, but I have not tried this.

Also remember that the first execution will include the model download that happens in the background (if you have not used that model before) and might therefore take some time. Enjoy your chats!

1 Comment

Leave a Comment