Simon Willison’s Weblog

Subscribe

s3-credentials: a tool for creating credentials for S3 buckets

3rd November 2021

I’ve built a command-line tool called s3-credentials to solve a problem that’s been frustrating me for ages: how to quickly and easily create AWS credentials (an access key and secret key) that have permission to read or write from just a single S3 bucket.

The TLDR version

To create a new S3 bucket and generate credentials for reading and writing to it:

% pip install s3-credentials
% s3-credentials create demo-bucket-for-simonwillison-blog-post --create-bucket
Created bucket: demo-bucket-for-simonwillison-blog-post
Created  user: 's3.read-write.demo-bucket-for-simonwillison-blog-post' with permissions boundary: 'arn:aws:iam::aws:policy/AmazonS3FullAccess'
Attached policy s3.read-write.demo-bucket-for-simonwillison-blog-post to user s3.read-write.demo-bucket-for-simonwillison-blog-post
Created access key for user: s3.read-write.demo-bucket-for-simonwillison-blog-post
{
    "UserName": "s3.read-write.demo-bucket-for-simonwillison-blog-post",
    "AccessKeyId": "AKIAWXFXAIOZHY6WAJSF",
    "Status": "Active",
    "SecretAccessKey": "...",
    "CreateDate": "2021-12-06 23:54:08+00:00"
}

You can now use the that AccessKeyId and SecretAccessKey to read and write files in that bucket.

The need for bucket credentials for S3

I’m an enormous fan of Amazon S3: I’ve been using it for fifteen years now (since the launch in 2006) and it’s my all-time favourite cloud service: it’s cheap, reliable and basically indestructible.

You need two credentials to make API calls to S3: an AWS_ACCESS_KEY_ID and a AWS_SECRET_ACCESS_KEY.

Since I often end up adding these credentials to projects hosted in different environments, I’m not at all keen on using my root-level credentials here: usually a project works against just one dedicated S3 bucket, so ideally I would like to create dedicated credentials that are limited to just that bucket.

Creating those credentials is surprisingly difficult!

Dogsheep Photos

The last time I solved this problem was for my Dogsheep Photos project. I built a tool that uploads all of my photos from Apple Photos to my own dedicated S3 bucket, and extracts the photo metadata into a SQLite database. This means I can do some really cool tricks using SQL to analyze my photos, as described in Using SQL to find my best photo of a pelican according to Apple Photos.

The photos are stored in a S3 private bucket, with a custom proxy in front of them that I can use to grant access to specific photographs via a signed URL.

For the proxy, I decided to create dedicated credentials that were allowed to make read-only requests to my private S3 bucket.

I made detailed notes along the way as I figured out to do that. It was really hard! There’s one step where you literally have to hand-edit a JSON policy document that looks like this (replace dogsheep-photos-simon with your own bucket name) and paste that into the AWS web console:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::dogsheep-photos-simon/*"
      ]
    }
  ]
}

I set myself an ambition to try and fix this at some point in the future (that was in April 2020).

Today I found myself wanting new bucket credentials, so I could play with Litestream. I decided to solve this problem once and for all.

I’ve also been meaning to really get my head around Amazon’s IAM permission model for years, and this felt like a great excuse to figure it out through writing code.

The process in full

Here are the steps you need to take in order to get long-lasting credentials for accessing a specific S3 bucket.

  1. Create an S3 bucket
  2. Create a new, dedicated user. You need a user and not a role because long-lasting AWS credentials cannot be created for roles—and we want credentials we can use in a project without constantly needing to update them.
  3. Assign an “inline policy” to that user granting them read-only or read-write access to the specific S3 bucket—this is the JSON format shown above.
  4. Create AWS credentials for that user.

There are plenty of other ways you can achieve this: you can add permissions to a group and assign that user to a group, or you can create a named “managed policy” and attach that to the user. But using an inline policy seems to be the simplest of the available options.

Using the boto3 Python client library for AWS this sequence converts to the following API calls:

import boto3
import json

s3 = boto3.client("s3")
iam = boto3.client("iam")

username = "my-new-user"
bucket_name = "my-new-bucket"
policy_name = "user-can-access-bucket"

policy_document = {
    "... that big JSON document ...": ""
}

# Create the bucket
s3.create_bucket(Bucket=bucket_name)

# Create the user
iam.create_user(UserName=username)

# Assign the policy to the user
iam.put_user_policy(
    PolicyDocument=json.dumps(policy_document),
    PolicyName=policy_name,
    UserName=username,
)

# Retrieve and print the credentials
response = iam.create_access_key(
    UserName=username,
)
print(response["AccessKey"])

Turning it into a CLI tool

I never want to have to figure out how to do this again, so I decided to build a tool around it.

s3-credentials is a Python CLI utility built on top of Click using my click-app cookicutter template.

It’s available through PyPI, so you can install it using:

% pip install s3-credentials

The main command is s3-credentials create, which runs through the above sequence of steps.

To create read-only credentials for my existing static.niche-museums.com bucket I can run the following:

% s3-credentials create static.niche-museums.com --read-only

Created user: s3.read-only.static.niche-museums.com with permissions boundary: arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
Attached policy s3.read-only.static.niche-museums.com to user s3.read-only.static.niche-museums.com
Created access key for user: s3.read-only.static.niche-museums.com
{
    "UserName": "s3.read-only.static.niche-museums.com",
    "AccessKeyId": "AKIAWXFXAIOZJ26NEGBN",
    "Status": "Active",
    "SecretAccessKey": "...",
    "CreateDate": "2021-11-03 03:21:12+00:00"
}

The command shows each step as it executes, and at the end it outputs the newly created access key and secret key.

It defaults to creating a user with a username that reflects what it will be able to do: s3.read-only.static.niche-museums.com. You can pass --username something to specify a custom username instead.

If you omit the --read-only flag it will create a user with read and write access to the bucket. There’s also a --write-only flag which creates a user that can write to but not read from the bucket—useful for use-cases like logging or backup scripts.

The README has full documentation on the various other options, plus details of the other s3-credentials utility commands list-users, list-buckets, list-user-policies and whoami.

Learned along the way

This really was a fantastic project for deepening my understanding of S3, IAM and how it all fits together. A few extra points I picked up:

  • AWS users can be created with something called a permissions boundary. This is an advanced security feature which lets a user be restricted to a set of maximum permissions—for example, only allowed to interact with S3, not any other AWS service.

    Pemissions boundaries do not themselves grant permissions—a user will not be able to do anything until extra policies are added to their account. It instead acts as defense in depth, setting an upper limit to what a user can do no matter what other policies are applied to them.

    There’s one big catch: the value you set for a permissions boundary is a very weakly documented ARN string—the boto3 documentation simply calls it “The ARN of the policy that is used to set the permissions boundary for the user”. I used GitHub code search to dig up some examples, and found arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess and arn:aws:iam::aws:policy/AmazonS3FullAccess to be the ones most relevant to my project. This random file appears to contain more.

  • Those JSON policy documents really are the dark secret magic that holds AWS together. Finding trustworthy examples of read-only, read-write and write-only policies for specific S3 buckets was not at all easy. I made detailed notes in this comment thread—the policies I went with are baked into the policies.py file in the s3-credentials repository. If you know your way around IAM I would love to hear your feedback on the policies I ended up using!

  • Writing automated tests for code that makes extensive use of boto3—such that those tests don’t make any real HTTP requests to the API—is a bit fiddly. I explored a few options for this—potential candidates included the botocore.stub.Stubber class and the VCR.py class for saving and replaying HTTP traffic (see this TIL). I ended up going with Python’s Mock class, via pytest-mock—here’s another TIL on the pattern I used for that. (Update: Jeff Triplett pointed me to moto which looks like a really great solution for this.)

Feedback from AWS experts wanted

The tool I’ve built solves my specific problem pretty well. I’m nervous about it though: I am by no means an IAM expert, and I’m somewhat paranoid that I may have made a dumb mistake and baked it into the tooling.

As such, the README currently carries a warning that you should review what the tool is doing carefully before trusting it against your own AWS account!

Update 20 February 2022: I removed that warning, since I’ve now spent long enough working on this tool that I’m comfortable with how it works.

If you are an AWS expert, you can help: I have an open issue requesting expert feedback, and I’d love to hear from people with deep experience who can either validate that my approach is sound or help explain what I’m doing wrong and how the process can be fixed.