Real Python Logo

Episode 16: Thinking in Pandas: Python Data Analysis the Right Way

The Real Python Podcast

Jul 03, 2020 1h 2m

Are you using the Python library Pandas the right way? Do you wonder about getting better performance, or how to optimize your data for analysis? What does normalization mean? This week on the show we have Hannah Stepanek to discuss her new book “Thinking in Pandas”.

The inspiration behind Hannah’s book came out of her talk at PyCon US 2019 titled “Thinking Like a Panda: Everything You Need to Know to Use Pandas the Right Way.” We discuss several core concepts covered in the book. She shares techniques for getting more performance when working with your data in Pandas. We also talk about her recent PyCon US 2020 online presentation about databases and migration.

Topics:

  • 00:00:00 – Introduction
  • 00:01:36 – Working for New Relic
  • 00:03:14 – Thinking in Pandas book release
  • 00:03:27 – Who is the intended reader?
  • 00:05:27 – What is the underlying tech for Pandas?
  • 00:09:04 – Why you shouldn’t use apply?
  • 00:13:00 – When you have to use apply
  • 00:16:06 – Normalizing your data
  • 00:17:05 – Do you have a preferred format for a dataframe?
  • 00:18:17 – More on multi-index dataframes
  • 00:24:50 – Creating NumPy types
  • 00:28:30 – Loading in your data
  • 00:30:33 – Video Course Spotlight
  • 00:31:41 – Pivoting data
  • 00:34:34 – Considering outside libraries and performance
  • 00:35:41 – What topic were you eager to share in the book?
  • 00:37:52 – What resources did you use to learn pandas?
  • 00:40:53 – PyCon 2020 talk about databases and migration
  • 00:45:34 – Delving into migration and Alembic
  • 00:53:15 – Speaking opportunities
  • 00:56:13 – What are you excited about in the world of Python?
  • 00:57:32 – What do you want to learn next?
  • 00:58:49 – Do you read source code to learn?
  • 01:00:16 – Is there a particularly well-written library?
  • 01:01:28 – Final Thanks

Links: