I led a tutorial for the DC PyLadies group titled, An introduction to the data science workflow in Python. Here are the materials, which I hope will live on in usefulness.

The workshop covered: (1) getting data (with examples of CSV loading, SQL querying, and API calling), (2) cleaning data (with some visualizations in matplotlib), and (3) modeling data (fitting a linear regression using three libraries - statsmodels, scikit-learn and plain ol' numpy - and confirming the coefficients are the same).


My goal for the workshop was to give a "tour" of the common Python packages used in data science. This was aimed towards women who might have some experience doing data analysis in R, Stata, or something else, and thus needed to get a handle on where to find their favorite stats/machine learning methods in the world of Python (rather than how those methods worked - i.e. I didn't want this to be a stats class).

I ran a survey beforehand to also get a sense of how new to Python attendees were (by asking a couple "beginner" questions, like, What does import some_library do?), what other programming languages they knew, and what they were most excited about learning. This helped a lot, since I knew what I wanted to present (supply!) but I also knew I wanted to meet folks where they were at (demand!).


I used Ned Batchelder's pizza.py script to calculate how many pizzas to order. Ned organizes the extremely popular (and very great) Boston Python User Group, and he is wise in the ways of Meetup. Specifically, notorious Meetup attrition (only about 50% of RSVPs usually show up). Anyway, would you believe it - we had <1 pizza left over - pizza.py really came through!

To bypass Installation Hell - wherein 33% of a workshop is spent trying to get some version of Python installed on some attendee's machine - I used Binder, which I found via Julia Evans's blog. Binder is SUPER convenient: you make your repo with your Jupyter notebooks (and a requirements.txt), point Binder to it, and it'll launch that repo as a Docker container that people can access and interact with. This was really great, since we could skip the installation pain and get straight to it.

The challenges of pedagogy

I love teaching, and I love learning. I also love thinking about learning - that mystical moment when you get something - and reflecting on how and why Things Are Learned. Teaching is a special challenge because you want to be engaging (fun!) and clear. (David Malan, who teaches CS50, is a master at this.) Being clear, especially about stuff that either is complex or is commonly perceived to be complex, is hard. I think it's very important to be clear - most big human ideas can be boiled down to something quite simple. I strongly believe in this ad by Khan Academy - human knowledge is a set of simple building blocks. Also, to paraphrase Richard Feynman, if you can't explain it in simple terms, you don't understand it.

So, anyway, I tried to do that - but it was pretty exhausting. I was definitely pooped by the end - and dehydrated! I did appreciate the opportunity, though, and hope to get better at it. Bottom line is I find a lot of inspiration in teaching/learning, and have become super growth mindset lately.