Raspberry Pi

PyPI In a box: Using a Raspberry Pi as a portable PyPI server

A conversation was started after in the PyCon Africa slack group after the conference discussing solutions to problems common in our communities and projects we can collaborate on and the biggest problem that came to my mind was the state of Internet connectivity in Africa.

Internet access is poor in many parts of the continent and the costs are prohibitive in the countries that have good connections. This makes it difficult for developers on the continent to get access to all the resources they need when they need them.

This isn’t a problem I can solve and I’m sure it will take some time before Internet access is evenly distributed and made affordable for everyone.

One approach to overcoming this problem is to make Web content available for offline use. To this end, I thought of the web content that I and many other software developers consume on a regular basis. I wanted to find a way to make the content of websites such as Stack Overflow, Wikipedia and PyPI(Python Package Index) available offline.

I did research on this and found that it is possible. At the time of writing, I have successfully cloned Stack OverFlow and PyPI. In this post I will discuss how to clone the PyPI repository to a Raspberry Pi and serve up the content in order to allow connected devices to pip install packages without an active Internet connection.

Goals

There are a number of ways to go about creating your own PyPI mirror and the way I did it might now work for everyone. My goals for this project project were:

  • The hardware to do this must be affordable (<=US$100)
  • There must be little or no setup required on the client computers.

I decided to use a Raspberry Pi 4 running on Raspian with a 200Gb SD Card for storage. I used minirepo to clone PyPI, pypiserver to serve up the packages and nginx to create a reverse proxy.

Raspberry Pi
Image by KRITSADA JAIYEN from Pixabay

There are four steps involved in creating a local PyPI server:

  1. Download and install Operating System and system utilities
  2. Configure Raspberry Pi to act as a WiFi hotspot, DHCP and DNS server
  3. Clone/Download PyPI packages
  4. Configure a webserver to deliver the downloaded packages to connected clients

I’ll explain the steps above in more detail below.

1. Download and install Operating System and system utilities

The Raspberry Pi is a small credit card sized computer that sells for at least US $35. For this project I used the Raspberry Pi 4 but other models should also work. Raspberry Pi models 3 and 4 have built in WiFi adapters which make the job of setting up the Pi as a WiFi hotspot or access point simpler than when using an external wireless adapter.

To get started, download and install Raspbian. Raspbian is a light-weight Debian based OS that is optimised for the Raspberry Pi. In order to work as an access point, the Raspberry Pi will need to have access point software installed, along with DHCP server software to provide connecting devices with a network address.

Next, download all the utilities and packages you will need before configuring the Raspberry Pi. I learned this the hard way after I messed up the network configurations on the Pi and ended up not being able to download anything afterwards.

You need the following software packages:

  • dnsmasq — DNS and DHCP Server software
  • hostapd — Access Point software
  • minirepo — Used to clone PyPI for offline use
  • pypiserver — Creates an index from cloned PyPI packages
  • nginx — A web server

To install these packages, run these two commands:

$ sudo apt install dnsmasq hostapd nginx
$ pip install minirepo pypiserver

2. Configure Raspberry Pi for WiFi hotspot, DHCP and DNS

The goal in this step is to configure a stand alone network to act as a server so the Raspberry Pi needs to have a static IP Address assigned to the Wireless port. To configure the static IP, edit the dhcpcd configuration file:

$ sudo nano /etc/dhcpcd.conf

add the following:

interface wlan0
    static ip_address=192.168.4.1/24
    nohook wpa_supplicant

Configure DHCP

A lot of the default settings in the dnsmasq settings are not necessary. Create a new configuration file:

$ sudo mv /etc/dnsmasq.conf /etc/dnsmasq.conf.orig
$ sudo nano /etc/dnsmasq.conf

Add the following configuration:

interface=wlan0
listen-address=192.168.4.1
dhcp-range=192.168.4.2,192.168.4.30,255.255.255.0,24h
address=/raspberrypi.local/192.168.4.1

This sets up DHCP for clients connecting through the wireless interface wlan0.
The second line tells the DHCP server(dnsmasq) to listen to connections coming in from the static IP you setup in the previous step. The next line tells DHCP to provide IP addresses 192.168.4.2 to 192.168.4.30 with a lease time of 24 hours.

Create an Access Point

Next, configure the access point software(hostapd):

$ sudo nano /etc/hostapd/hostapd.conf

Add the following:

# /etc/hostapd/hostapd.conf                          

interface=wlan0
driver=nl80211
ssid=NameOfNetwork
hw_mode=g
channel=7
wmm_enabled=0
macaddr_acl=0
auth_algs=1
ignore_broadcast_ssid=0
wpa=2
wpa_key_mgmt=WPA-PSK
wpa_pairwise=TKIP
rsn_pairwise=CCMP
wpa_passphrase=YourNetworkPassword

Add your own network name and network password where it saysssid and wpa_passphrase,respectively.

Tell the system where to find this file, open the hostapd config file:
sudo nano /etc/default/hostapd

Find the line with #DAEMON_CONF, and replace it with this:

DAEMON_CONF="/etc/hostapd/hostapd.conf"

Add routing and masquerade

Edit /etc/sysctl.conf and uncomment the line that says:
net.ipv4.ip_forward=1

Add a masquerade for outbound traffic on eth0:
sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

Save the new rule:
sudo sh -c "iptables-save > /etc/iptables.ipv4.nat"

Edit /etc/rc.local and add the following above “exit 0” to install the rules at boot:

iptables-restore < /etc/iptables.ipv4.nat

This is important if you decide to share an internet connection or setup a bridge on the Raspberry Pi later.

The Raspberry Pi should be ready to work as an access point. If you're connected to it directly, now would be a good time to enable SSH. Reboot the Raspberry Pi and test if everything works.

Using a different WiFi enabled device like a phone or laptop, scan for new wireless networks. If everything went smoothly, you should see the WiFi network you created above. Try connecting to it.

3.Clone PyPI

In this section, you will see how to clone PyPI and configure the following packages:

  • minirepo
  • pypiserver
  • nginx

Minirepo

Minirepo is a commandline program that downloads packages from PyPI.org so you can use pip without Internet. The easiest way to install it is to use pip:

$ pip install minirepo

The first time it’s executed, minirepo will ask you for the local repository path(where it should save downloaded packages to), which defaults to ~/minirepo in Linux. A JSON configuration file is created and saved as ~/.minirepo, that you can edit to your preferences.

There are a number of alternatives out there for cloning PyPI, but I used minirepo because it allows you to download a selective mirror, only downloading all sources for Python 3, for example. At the time of writing this post, the entire PyPI repository is somewhere in the neighbourhood of 1TB but, by using a selective download, I was able to get it down to 120GB or so. Here's the configuration I used for this project:

{
  "processes": 10, 
  "package_types": [
    "bdist_egg", 
    "bdist_wheel", 
    "sdist"
  ], 
  "extensions": [
    "bz2", 
    "egg", 
    "gz", 
    "tgz", 
    "whl", 
    "zip"
  ], 
  "python_versions": [
    "3.0",
    "3.1",
    "3.2",
    "3.3",
    "3.4.10",
    "3.5.7",
    "3.6.9",
    "3.7.2",
    "3.7.3",
    "3.7.4", 
    "any", 
    "cp27", 
    "py2", 
    "py2.py3", 
    "py27", 
    "source"
  ], 
  "repository": "/home/pi/minirepo"
}

The configuration above downloads sources for Python 3 and limits the package types to sdist, bdist_wheel and bdist_egg packages. The downside of using this approach is that some packages that don't meet the filter criteria will not get downloaded.

Cloning PyPI takes a long time, so you'll want to leave it running in the background, while you watch a movie or ten depending on your Internet connection speed.

Pypiserver

At this point, you should have PyPI mirrored to your computer. My local PyPI mirror has 200000+ packages.
Before we get to the next step, it is important to take a step back to understand what pip is and how it works.

Pip is the most popular tool for installing Python packages, and the one included with modern versions of Python. It provides the essential core features for finding, downloading, and installing packages from PyPI and other Python package indexes, and can be incorporated into a wide range of development workflows via its command-line interface (CLI).

Pip supports installing packages from:

  • PyPI (and other indexes) using requirement specifiers.
  • VCS project urls.
  • Local project directories
  • Local or remote source archives

Since you have cloned the PyPi packages to a local repository, pip can install those packages directly from the local PyPI mirror you just downloaded. That is not the purpose of this article however. The goal here is to allow remote clients to connect to the Raspberry Pi and download packages over the network. This is where pypiserver comes in.

pypiserver, will serve up the local package index that will allow pip to find packages in your repository over the network.

First, test to see if it works:

$ pypi-server -p 8080 ~/minirepo & # Will listen to all IPs.

Notice that when running it, the command to run it is pypi-server and not pypyserver.

Here, you're starting pypiserver and running it on port 8080. It will find packages in the minirepo folder. This process will keep running in the background until you either kill it or shutdown the Raspberry Pi. I will show you how to start it a boot later.

If you visit the static IP you set for the Raspberry Pi at port 8080 in your browser you should see a message similar to the one below:

You can install from the local packages repository now:

pip install --index-url http://localhost:8080/simple/

OR, from a client computer:

pip install --index-url http://192.168.4.1:8080/

If you have installed pypiserver on a remote URL without HTTPS you will receive an “untrusted” warning from pip, urging you to append the --trusted-host option:

pip --trusted-host 192.168.4.1 install --index-url http://192.168.4.1:8080/

An even shorter way:
pip --trusted-host 192.168.4.1 install -i http://192.168.4.1:8080/

Always specifying the local pypi URL and the trusted host flags on the commandline can be cumbersome.
If you want to always install packages from your own mirror, create this pip config file in your home directory or in a virtual environment:

[global]
trusted-host = 192.168.4.1

[install]
index-url = http://192.168.4.1:8080

Home directory

  • On Unix and macOS the home directory file is: $HOME/.pip/pip.conf
  • On Windows the file is: %HOME%\pip\pip.ini

In a virtual environment:

  • On Unix and macOS the file is $VIRTUAL_ENV/pip.conf
  • On Windows the file is: %VIRTUAL_ENV%\pip.ini

I recommend placing this config file in a virtual environment.

4. Setup a Web server to deliver the packages.

By default, pypiserver scans the entire packages directory each time an incoming HTTP request occurs. This can cause significant slow downs when serving a large number of packages like we are in this instance.

One way to serve the files up faster is to put pypiserver behind a reverse proxy and enabling your web server's built in caching functionality. I'll use nginx in this article but you're free to use any webserver you prefer.

Setup a new virtual host in nginx.

Create a file /etc/nginx/sites-available/cheeseshop.com. For the purposes of this article I'll refer to the new virtual host as cheeseshop.com.

Run $ sudo nano /etc/nginx/sites-available/cheeseshop.com and add the following content:


proxy_cache_path /data/nginx/cache
                 levels=1:2
                 keys_zone=pypiserver_cache:10m
                 max_size=10g
                 inactive=120m
                 use_temp_path=off;


upstream pypi {
        server 127.0.0.1:8080;
}

server {
        listen 80;
        server_name cheeseshop.com;
        autoindex on;
        location / {
          proxy_set_header Host $host:$server_port;
          proxy_set_header X-Forwarded-Proto $scheme;
          proxy_set_header X-Real-IP $remote_addr;
          proxy_cache pypiserver_cache;
          proxy_pass       http://pypi;
        }
}

The first part of the config instructs nginx to create a 10GB cache that will remain active for 2 hours.
The upstream pypi section is responsible for serving up content from the pypiserver running on port 8080. CheeseShop is the secret code name for the Python Package Index so that's why I named the server that. You can use any name or IP address you like.

The server section specifies that port 80 will be used for incoming HTTP connections and that those requests get forwarded to the pypi server.

I don't own the cheeseshop.com domain but I can use it since we are creating a stand alone network without access to the internet. In order for client computers to be able to connect to cheeseshop.com, you'll want to tell the DNS server how to resolve it. More on that in a bit.

To enable this new virtual host, you want to create a symbolic link to the config file you just created in the /etc/nginx/sites-enabled/ folder:

$ sudo ln -s /etc/nginx/sites-available/cheeseshop.com /etc/nginx-sites-enabled/

Doing this will enable the new virtual host. Check that everything works by running
sudo nginx -t. If everything checks out, great! Next you want to make a small DNS change to map the cheeseshop.com domain to an IP address.

Open /etc/hosts and add an entry for the newly created cheeseshop.com domain:

192.168.4.1     cheeseshop.com

The hosts file contains domain to IP address mappings that help the computer serve you the right content. Dnsmasq will check this file whenever it starts up so it is a good idea to restart it:

sudo service dnsmasq restart

Restart nginx too for good measure:

sudo service nginx restart

Assuming everything went smoothly, you should be able to install Python packages from client computers using a hostname as opposed to using an IP now.

Using the server

To test this out, connect to the Raspberry Pi's WiFi network and create a new virtual environment on a client computer and run the following command inside the virtual environment:

pip --trusted-host cheeseshop.com install -i http://cheeseshop.com django

Running that command produces the following output:

pip --trusted-host cheeseshop.com install -i http://cheeseshop.com django 
Looking in indexes: http://cheeseshop.com
Collecting django
  Downloading http://cheeseshop.com:80/packages/Django-3.0.1.tar.gz (9.0 MB)
     |████████████████████████████████| 9.0 MB 1.1 MB/s 
Collecting pytz
  Downloading http://cheeseshop.com:80/packages/pytz-2019.3-py2.py3-none-any.whl (509 kB)
     |████████████████████████████████| 509 kB 1.3 MB/s 
Collecting sqlparse>=0.2.2
  Downloading http://cheeseshop.com:80/packages/sqlparse-0.3.0-py2.py3-none-any.whl (39 kB)
Collecting asgiref~=3.2
  Downloading http://cheeseshop.com:80/packages/asgiref-3.2.3-py2.py3-none-any.whl (18 kB)
Building wheels for collected packages: django
  Building wheel for django (setup.py) ... done
  Created wheel for django: filename=Django-3.0.1-py3-none-any.whl size=7428296 sha256=b31336b1249afbdbb2374912f6983179f4715127d7e6b842a8455a94a1518ce5
  Stored in directory: /home/terra/.cache/pip/wheels/6f/55/5c/aca7917f1899fbb7430677d9d6ef7c6be748c412dec3e63c04
Successfully built django
Installing collected packages: pytz, sqlparse, asgiref, django
Successfully installed asgiref-3.2.3 django-3.0.1 pytz-2019.3 sqlparse-0.3.0

Starting pypiserver at boot(Optional)

To ensure that pypiserver software starts up automatically at boot, create a new Linux service and use systemd to manage it.

1. Create a start up script that the service will manage, call it start-pypi-server.sh. Add the following content to it:

                 
#! /bin/bash
/home/pi/.local/bin/pypi-server -p 8080 /home/pi/minirepo/ &

2. Copy the script to /usr/bin and make it executable:
sudo cp start-pypi-server.sh /usr/bin/start-pypi-server.sh
sudo chmod +x /usr/bin/start-pypi-server.sh

3. Create a unit file to define a systemd service. Name it pypiserver.service:

 GNU nano 3.2         /lib/systemd/system/pypiserver.service                   

[Unit]
Description=A minimal PyPI server for use with pip/easy_install.

[Service]
Type=forking
ExecStart=/bin/bash /usr/bin/start-pypi-server.sh
User=pi

[Install]
WantedBy=multi-user.target


This defines a basic service. The ExecStart directive specifies the command that will be run to start the service.

4. Copy the unit file to /etc/systemd/system and give it permissions:
sudo cp pypiserver.service /etc/systemd/system/pypiserver.service

sudo chmod 644 /etc/systemd/system/pypiserver.service

Start and Enable the Service

Once you have created a unit file, you can test the service:

sudo systemctl start pypiserver

2. Check the status of the pypiserver service:
sudo systemctl status pypiserver
This will produce output similar to this:

$ sudo systemctl status pypiserver
● pypiserver.service - A minimal PyPI server for use with pip/easy_install.
   Loaded: loaded (/etc/systemd/system/pypiserver.service; enabled; vendor prese
   Active: active (running) since Fri 2020-02-07 19:17:05 CAT; 2h 19min ago
  Process: 420 ExecStart=/bin/bash /usr/bin/start-pypi-server.sh (code=exited, s
 Main PID: 441 (pypi-server)
    Tasks: 4 (limit: 4915)
   Memory: 408.0M
   CGroup: /system.slice/pypiserver.service
           └─441 /usr/bin/python /home/pi/.local/bin/pypi-server -p 8080 /home/p

Feb 07 19:17:05 raspberrypi systemd[1]: Starting A minimal PyPI server for use w
Feb 07 19:17:05 raspberrypi systemd[1]: Started A minimal PyPI server for use wi

3. To stop or restart the service:
sudo systemctl stop pypiserver
sudo systemctl restart pypiserver

4. Finally, use the enable command to ensure that the service starts whenever the system boots:

sudo systemctl enable pypiserver

Conclusion

You have seen how to create your own local PyPI clone on a Raspberry Pi. You learned how to

  • Setup a Raspberry Pi as an access point
  • Setup the Raspberry Pi as a DHCP and DNS server
  • Clone PyPi
  • Use a web server to serve up the cloned packages.

I did this as a proof of concept to show that it is possible to run something like PyPI offline. I am sure there are a better or more efficient ways I could have done this. Please leave a comment below with any suggestions or criticism. Thanks for reading.

References: