How to serve an LLM as a Streamlit app on Google Colab

Devesh Surve
4 min readMay 9, 2024

Introduction

So, a while back, I was working on a college project using an LLM model. Now, as we know platforms like Google Colab and Kaggle are gems for data science enthusiasts. Why? Because they offer free GPU access — although, yes, it’s somewhat limited. But that’s enough to get your feet wet!

During my project, I realized I needed to make an interface for my model. I chose Streamlit, a popular choice for quick and easy web apps. The catch? I wanted to share my work with others, and that meant deploying it online. Here’s where it gets tricky — I was using Colab, not my local machine.

I scoured the internet for a straightforward guide on deploying from Colab but found none. So, I decided to write one myself, ensuring it’s simple to follow, with no complex steps.

Let me take you through how you can do it too, using GPT-2 for text data.

Step-by-Step Guide to Deploying Your Streamlit App

Step 1: Install Streamlit & Hugging Face Transformers

Before you can deploy your app, you need the right tools. First, ensure that Streamlit is installed in your Colab environment. Streamlit is an open-source app framework that is excellent for turning data scripts into shareable web apps. Alongside, if you’re working with language models like GPT or Llama-3, you’ll need libraries from Hugging Face’s transformers. Here’s how you install both:

!pip install streamlit
!pip install transformers

Run these commands in your Google Colab notebook to get everything set up.

Step 2: Prepare Your App for Deployment

Now that you have the necessary tools, start preparing your Streamlit app. This involves writing a script that Streamlit will use to run your app. Here’s a simple template you can start with:

%%writefile app.py
import streamlit as st
from transformers import pipeline
import time

# Load your model (with caching to improve performance)
@st.cache_resource()
def load_model():
model = pipeline('text-generation', model='gpt2')
return model

model = load_model()

# Set up your web app
st.title('GPT-2 Story Completer')
st.header('Enter text to complete:')

user_input = st.text_area('Write something to activate the AI:', height=200)
max_length = st.slider("Select max story length:", min_value=50, max_value=200, value=100, step=10)
num_sequences = st.selectbox("Select number of stories to generate:", options=[1, 2, 3], index=0)

if st.button('Generate Story'):
with st.spinner('Generating Story...'):
response = model(user_input, max_length=max_length, num_return_sequences=num_sequences)
for i, summary in enumerate(response):
st.write(f'**Story {i+1}:**')
st.write(summary['generated_text'])
st.markdown("---")

st.sidebar.markdown("## Guide")
st.sidebar.info("This tool uses GPT-2 to generate a story of your provided text. Adjust the sliders to change the story length and number of stories generated. The model is optimized for short to medium length paragraphs.")
st.sidebar.markdown("### Examples")
st.sidebar.write("1. Paste a beginning to see how the AI completes it.")
st.sidebar.write("2. Try different settings to see how the story changes.")Note : You may need to add HF_TOKEN in Colab Secrets depending on the LLM. You can access your tokens from https://huggingface.co/settings/tokens

This script sets up a basic interface where users can input text, and the model generates a response. Customize this to fit the specifics of your model and what you want your app to do.

Step 3: Deploy the App using LocalTunnel

Since Google Colab does not allow direct hosting, we’ll use localtunnel to expose your local server to the internet. LocalTunnel allows you to easily share your locally running web server:

  1. Install LocalTunnel: Ensure you have Node.js installed ( which colab & kaggle would have by default ), then install localtunnel globally using npm:
!npm install -g localtunnel

2. Fetch Your Public IP: Use the curl command to retrieve your public IP address. We will need this as password to localtunnel.

# Your public ip is the password to the localtunnel
!curl ipv4.icanhazip.com

This should an output like :

10.111.101.111 # This will be as per your code not the same

3. Expose Your App: Run your streamlit app

!streamlit run app.py &>./logs.txt & npx localtunnel --port 8501

This should give an output like this

npx: installed 22 in 2.117s
your url is: https://rich-rockets-admire.loca.lt

Once, you open this. You should get a window like this

In Tunnel Password, enter the ip you recieved from above. And you should get something like this !

Here’s the notebook URL — https://colab.research.google.com/drive/1Tnif2N5zbsSkIogsNH2BiR4RxvLFnzgC?usp=sharing

Conclusion

There you have it! A fun and accessible way to build and serve your very own LLM-powered image webapp. This link could be now shared with anyone and they can view your work. Thank you so much for reading.

Now I usually don’t ask for these but, if you read all the way here and you found this article useful, I would really appreciate a comment or a clap!

Please follow if this type of content interests you !

--

--

Devesh Surve

Grad student by day, lifelong ML/AI explorer by night. I dive deep, then share easy-to-understand, step-by-step guides to demystify the complex.