Analyzing the Stack Overflow Survey

Mon 27 May 2019 by Moshe Zadka

The Stack Overflow Survey Results for 2019 are in! There is some official analysis, that mentioned some things that mattered to me, and some that did not. I decided to dig into the data and see if I can find some things that would potentially interest my readership.

import csv, collections, itertools
with open("survey_results_public.csv") as fpin:
    reader = csv.DictReader(fpin)
    responses = list(reader)
len(responses)
88883

Wow, almost 90K respondents! This is the sweet spots of "enough to make meaningful generalizations" while being able to analyze with rudimentary tools, not big-data-ware.

pythonistas = [x for x in responses if 'Python' in x['LanguageWorkedWith']]
len(pythonistas)/len(responses)
0.41001091322300104

About 40% of the respondents use Python in some capacity. That is pretty cool! This is one of the things where I wonder if there is bias in the source data. Are people who use Stack Overflow, or respond to surveys for SO, more likely to be the kind of person who uses Python? Or less?

In any case, I am excited! This means my favorite language, for all its issues, is doing well. This is also a good reminder that we need to think about the consequences of our decisions on a big swath of developers we will never ever meet.

opensource = collections.Counter(x['OpenSourcer'] for x in pythonistas)
sorted(opensource.items(), key=lambda x:x[1], reverse=True)
[('Never', 11310),
 ('Less than once per year', 10374),
 ('Less than once a month but more than once per year', 9572),
 ('Once a month or more often', 5187)]
opensource['Once a month or more often']/len(pythonistas)
0.1423318607139917

Python is open source. Almost all important libraries (Django, Pandas, PyTorch, requests) are open source. Many important tools (Jupyter) are open source. The number of people who contribute to them with any kind of regular cadence is less than 15%.

general_opensource = collections.Counter(x['OpenSourcer'] for x in responses)
sorted(general_opensource.items(), key=lambda x:x[1], reverse=True)
[('Never', 32295),
 ('Less than once per year', 24972),
 ('Less than once a month but more than once per year', 20561),
 ('Once a month or more often', 11055)]

The Python community does compare well to the general populace, though!

devtype = collections.Counter(itertools.chain.from_iterable(x["DevType"].split(";") for x in pythonistas))
devtype['DevOps specialist']/len(responses)
0.052282213696657406

About 5% of total respondents are my peers: using Python for DevOps. That is pretty exciting! My interest in that is not merely theoretical, my upcoming book targets that crowd.

general_devtype = collections.Counter(itertools.chain.from_iterable(x["DevType"].split(";") for x in responses))
general_devtype['DevOps specialist']/len(responses), devtype['DevOps specialist']/len(pythonistas)
(0.09970410539698255, 0.12751420025793705)

In general, DevOps specialists are 10% of respondents.

devtype['DevOps specialist']/general_devtype['DevOps specialist']
0.524373730534868

Over 50% of DevOps specialists use Python!

def safe_int(x):
    try:
        return int(x)
    except ValueError:
        return -1

intermediate = sum(1 for x in pythonistas if 1<=safe_int(x['YearsCode'])<=5)

My next hush-hush (for now!) project is going to be targeting intermediate Python developers. I wish I could slice by "number of years writing in Python, but this is the best I could do. (I treat "NA" responses as "not intermediate". This is OK, since I prefer to underestimate rather than overestimate.)

intermediate/len(responses)
0.11346376697456206

11%! Not bad.

general_intermediate = sum(1 for x in responses if 1<=safe_int(x['YearsCode'])<=5)
intermediate/len(pythonistas), general_intermediate/len(responses)
(0.27673352907279863, 0.2671264471271222)

Seems like using Python does not change much the chances of someone being intermediate.

Summary

  • 40% of respondents use Python. Python is kind of a big deal.
  • 5% of respondents use Python for DevOps. This is a lot! DevOps as a profession is less than 10 years old.
  • 11% of respondents are intermediate Python users. My previous book targets this crowd.

(Thanks to Robert Collins and Matthew Broberg for their comments on an earlier draft. Any remaining issues are purely my responsibility.)