How to make Python code concurrent with 3 lines

I was inspired by Ryan Palo’s quest to uncover gems in Python’s standard library.

I decided to share one of my favorite tricks in Python’s standard library through an example. The entire code runs on Python 3.2+ without external packages.

The initial problem

Let’s say you have a thousand URLs to process/download/examine, so you need to issue as much HTTP GET calls and retrieve the body of each response.

This is a way to do it:

import http.client
import socket

def get_it(url):
    try:
        # always set a timeout when you connect to an external server
        connection = http.client.HTTPSConnection(url, timeout=2)

        connection.request("GET", "/")

        response = connection.getresponse()

        return response.read()
    except socket.timeout:
        # in a real world scenario you would probably do stuff if the
        # socket goes into timeout
        pass

urls = [
    "www.google.com",
    "www.youtube.com",
    "www.wikipedia.org",
    "www.reddit.com",
    "www.httpbin.org"
] * 200

for url in urls:
    get_it(url)

(I wouldn’t use the standard library as an HTTP client but for the purpose of this post it’s okay)

As you can see there’s no magic here. Python iterates on 1000 URLs and calls each of them.

This thing on my computer occupies 2% of the CPU and spends most of the time waiting for I/O:

$ time python io_bound_serial.py
20.67s user 5.37s system 855.03s real 24292kB mem

It runs for roughly 14 minutes. We can do better.

Show me the trick!

from concurrent.futures import ThreadPoolExecutor as PoolExecutor
import http.client
import socket

def get_it(url):
    try:
        # always set a timeout when you connect to an external server
        connection = http.client.HTTPSConnection(url, timeout=2)

        connection.request("GET", "/")

        response = connection.getresponse()

        return response.read()
    except socket.timeout:
        # in a real world scenario you would probably do stuff if the
        # socket goes into timeout
        pass

urls = [
    "www.google.com",
    "www.youtube.com",
    "www.wikipedia.org",
    "www.reddit.com",
    "www.httpbin.org"
] * 200

with PoolExecutor(max_workers=4) as executor:
    for _ in executor.map(get_it, urls):
        pass

Let’s see what changed:

# import a new API to create a thread pool
from concurrent.futures import ThreadPoolExecutor as PoolExecutor

# create a thread pool of 4 threads
with PoolExecutor(max_workers=4) as executor:

    # distribute the 1000 URLs among 4 threads in the pool
    # _ is the body of each page that I'm ignoring right now
    for _ in executor.map(get_it, urls):
        pass

So, 3 lines of code, we made a slow serial task into a concurrent one, taking little short of 5 minutes:

$ time python io_bound_threads.py
21.40s user 6.10s system 294.07s real 31784kB mem

We went from 855.03s to 294.07s, a 2.9x increase!

Wait, there’s more

The great thing about this new API is that you can substitute

from concurrent.futures import ThreadPoolExecutor as PoolExecutor

with

from concurrent.futures import ProcessPoolExecutor as PoolExecutor

to tell Python to use processes instead of threads. Out of curiosity, let’s see what happens to the running time:

$ time python io_bound_processes.py
22.19s user 6.03s system 270.28s real 23324kB mem

20 seconds less than the threaded version, not much different. Keep in mind that these are unscientific experiments and I’m using the computer while these scripts run.

Bonus content

My computer has 4 cores, let’s see what happens to the threaded versions increasing the number of worker threads:

# 6 threads
20.48s user 5.19s system 155.92s real 35876kB mem
# 8 threads
23.48s user 5.55s system 178.29s real 40472kB mem
# 16 threads
23.77s user 5.44s system 119.69s real 58928kB mem
# 32 threads
21.88s user 4.81s system 119.26s real 96136kB mem

Three things to notice: RAM occupation obviously increases, we hit a wall around 16 threads and at 16 threads we’re more than 7x faster than the serial version.

If you don’t recognize time’s output is because I’ve aliased it like this:

time='gtime -f '\''%Us user %Ss system %es real %MkB mem -- %C'\'

where gtime is installed by brew install gnu-time

Conclusions

I think ThreadPoolExecutor and ProcessPoolExecutor are super cool additions to Python’s standard library. You could have done mostly everything they do with the “older” threading, multiprocessing and with FIFO queues but this API is so much better.