Practical Business Python

Taking care of business, one python script at a time

Mon 18 January 2021

Case Study: Automating Excel File Creation and Distribution with Pandas and Outlook

Posted by Chris Moffitt in articles   

article header image

Introduction

I enjoy hearing from readers that have used concepts from this blog to solve their own problems. It always amazes me when I see examples where only a few lines of python code can solve a real business problem and save organizations a lot of time and money. I am also impressed when people figure out how to do this with no formal training - just with some hard work and willingness to persevere through the learning curve.

This example comes from Mark Doll. I’ll turn it over to him to give his background:

I have been learning/using Python for about 3 years to help automate business processes and reporting. I’ve never had any formal training in Python, but found it to be a reliable tool that has helped me in my work.

Read on for more details on how Mark used Python to automate a very manual process of collecting and sorting Excel files to email to 100’s of users.

The Problem

Here’s Mark’s overview of the problem:

A business need arose to send out emails with Excel attachments to a list of ~500 users and presented us with a large task to complete manually. Making this task harder was the fact that we had to split data up by user from a master Excel file to create their own specific file, then email that file out to the correct user.

Imagine the time it would take to manually filter, cut and paste the data into a file, then save it and email it out - 500 times! Using this Python approach we were able to automate the entire process and save valuable time.

I have seen this type of problem multiple times in my experience. If you don’t have experience with a programming language, then it can seem daunting. With Python, it’s very feasible to automate this tedious process. Here’s a graphical view of what Mark was able to do:

File paths

Solving the Problem

The first step is getting the imports in place:

import datetime
import os
import shutil
from pathlib import Path
import pandas as pd
import win32com.client as win32

Now we will set up some strings with the current date and our directory structure:

## Set Date Formats
today_string = datetime.datetime.today().strftime('%m%d%Y_%I%p')
today_string2 = datetime.datetime.today().strftime('%b %d, %Y')

## Set Folder Targets for Attachments and Archiving
attachment_path = Path.cwd() / 'data' / 'attachments'
archive_dir = Path.cwd() / 'archive'
src_file = Path.cwd() / 'data' / 'Example4.xlsx'

Let’s take a look at the data file we need to process:

df = pd.read_excel(src_file)
df.head()
Excel file view

The next step is to group all of the CUSTOMER_ID transactions together. We start by doing a groupby on CUSTOMER_ID .

customer_group = df.groupby('CUSTOMER_ID')

It might not be apparent to you what customer_group is in this case. A loop shows how we can process this grouped object:

for ID, group_df in customer_group:
    print(ID)
A1000
A1001
A1002
A1005

Here’s the last group_df that shows all of the transactions for customer A1005:

Excel file view

We have everything we need to create an Excel file for each customer and store in a directory for future use:

## Write each ID, Group to Individual Excel files and use ID to name each file with Today's Date
attachments = []
for ID, group_df in customer_group:
    attachment = attachment_path / f'{ID}_{today_string}.xlsx'
    group_df.to_excel(attachment, index=False)
    attachments.append((ID, str(attachment)))

The attachments list contains the customer ID and the full path to the file:

[('A1000',
'c:\\Users\\chris\\notebooks\\2020-10\\data\\attachments\\A1000_01162021_12PM.xlsx'),
('A1001',
'c:\\Users\\chris\\notebooks\\2020-10\\data\\attachments\\A1001_01162021_12PM.xlsx'),
('A1002',
'c:\\Users\\chris\\notebooks\\2020-10\\data\\attachments\\A1002_01162021_12PM.xlsx'),
('A1005',
'c:\\Users\\chris\\notebooks\\2020-10\\data\\attachments\\A1005_01162021_12PM.xlsx')]

To make the processing easier, we convert the list to a DataFrame:

df2 = pd.DataFrame(attachments, columns=['CUSTOMER_ID', 'FILE'])
File paths

The final data prep stage is to generate a list of files with their email addresses by merging the DataFrames together:

email_merge = pd.merge(df, df2, how='left')
combined = email_merge[['CUSTOMER_ID', 'EMAIL', 'FILE']].drop_duplicates()

Which gives this simple DataFrame:

File paths

We’ve gathered the list of customers, their emails and the attachments. Now we need to send an email with Outlook. Refer to this article for additional explanation of this code:

# Email Individual Reports to Respective Recipients
class EmailsSender:
    def __init__(self):
        self.outlook = win32.Dispatch('outlook.application')

    def send_email(self, to_email_address, attachment_path):
        mail = self.outlook.CreateItem(0)
        mail.To = to_email_address
        mail.Subject = today_string2 + ' Report'
        mail.Body = """Please find today's report attached."""
        mail.Attachments.Add(Source=attachment_path)
        # Use this to show the email
        #mail.Display(True)
        # Uncomment to send
        #mail.Send()

We can use this simple class to generate the emails and attach the Excel file.

email_sender = EmailsSender()
for index, row in combined.iterrows():
    email_sender.send_email(row['EMAIL'], row['FILE'])
Outlook Email

The last step is to move the files to our archive directory:

# Move the files to the archive location
for f in attachments:
    shutil.move(f[1], archive_dir)

Summary

This example does a nice job of automating a highly manual process where someone likely did a lot of copying and pasting and manual file manipulation. I hope the solution that Mark developed can help you figure out how to automate some of the more painful parts of your job.

I encourage you to use this example to identify similar challenges in your day to day work. Maybe you don’t have to work with 100’s of files but you might have a manual process you run once a week. Even if that process only takes 1 hour, use that as a jumping off point to figure out how to use Python to make it easier. There is no better way to learn Python than to apply it to one of your own problems.

Thanks again to Mark for taking the time to walk us through this content example!

Comments