Real Python Podcast Episode #199 Title Artwork

Episode 199: Leveraging Documents and Data to Create a Custom LLM Chatbot

The Real Python Podcast

Apr 05, 2024 1h 8m

How do you customize a LLM chatbot to address a collection of documents and data? What tools and techniques can you use to build embeddings into a vector database? This week on the show, Calvin Hendryx-Parker is back to discuss developing an AI-powered, Large Language Model-driven chat interface.

Episode Sponsor:

Calvin is the co-founder and CTO of Six Feet Up, a Python and AI consultancy. He shares a recent project for a family-owned seed company that wanted to build a tool for customers to access years of farm research. These documents were stored as brochure-style PDFs and spanned 50 years.

We discuss several of the tools used to augment a LLM. Calvin covers working with LangChain and vectorizing data with ChromaDB. We talk about the obstacles and limitations of capturing documentation.

Calvin also shares a smaller project that you can try out yourself. It takes the information from a conference website and creates a chatbot using Django and Python prompt-toolkit.

This episode is sponsored by Mailtrap.

Topics:

  • 00:00:00 – Introduction
  • 00:02:21 – Background on the project
  • 00:03:51 – Complexity of adding documents
  • 00:09:01 – Retrieval-augmented generation and providing links
  • 00:13:46 – Updating information and larger conversation context
  • 00:18:08 – Sponsor: Mailtrap
  • 00:18:43 – Working with context
  • 00:21:02 – Temperature adjustment
  • 00:22:07 – Rally Conference Chatbot Project
  • 00:26:20 – Vectorization using ChromaDB
  • 00:32:49 – Employing Python prompt-toolkit
  • 00:35:07 – Learning libraries on the fly
  • 00:37:38 – Video Course Spotlight
  • 00:39:00 – Problems with tables in documents
  • 00:42:30 – Everything looks like a chat box
  • 00:44:26 – Finding the right fit for a client and customer
  • 00:49:05 – What are questions you ask a new client now?
  • 00:51:54 – Canada Air anecdote
  • 00:56:20 – How do you stay up to date on these topics?
  • 01:01:03 – What are you excited about in the world of Python?
  • 01:03:22 – What do you want to learn next?
  • 01:04:58 – How can people follow your work online?
  • 01:05:31 – IndyPy
  • 01:07:13 – Thanks and goodbye

Show Links: