I have a running mental list of little inefficiencies that annoy me throughout my workdays. The sand, the pebbles, the annoyances of my working life. I try to follow this xkcd in determining if it's worth the time to actually fix them. The basic heuristic is: if it's a small thing, but I do it every day, it's worth spending a few hours optimizing it.

Today I tackled one: creating a global config file to auto-import all my favorite modules into any newly-created Jupyter notebook.


I work a lot with Jupyter notebooks. They're a standard data science tool. What usually happens, though, is I experience "Jupyter efficiency drift". I'll spend a chunk of time (several days/weeks) digging into notebooks, and I'll get really good at importing the right modules, setting my matplotlib up all nice and friendly, and making little IPython JavaScript helper functions. Then I'll spend a chunk of time on something else - like deploying stuff (gettin' real good at bash!), or stats stuff (yay Bayes!), or whatever - and I'll have forgotten what my One True Jupyter Notebook setup is.

My crappy workaround has always been to dig up the last notebook I was working on, and copy the top few cells:

%load_ext autoreload
%matplotlib inline

import os
import requests
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as mpl
from IPython.display import Javascript
[blah blah blah]

mpl.rcParams['figure.figsize'] = (12, 8)

def alert():  
    js = """alert("All done!");"""

%autoreload 2

The problem was - some notebooks didn't have the entire rigamarole, some were missing my beloved alert() function, and this was a stupid process. Sometimes I'd open a new notebook just to do some small thing, and it seemed like overkill to stay meticulous about copy+pasting that One True First Cell.

Solution: ~/.ipython/profile_default/startup

Because surely Jupyter must have some top-level, global-environment-style config that can do this for me? I mean, Stata has profile.do, bash has .bashrc/.bash_profile, is this some crazy idea? To paraphrase Kenneth Branagh: no, the world must be dotfiled!

people it

And, indeed, Jupyter - or rather, Jupyter's ur-center, IPython - has startup files (docs). The basic process:

  1. Go to your default IPython profile's startup folder: cd ~/.ipython/profile_default/startup
  2. Make a Python script for what you wanna import: subl 0-the-world-must-be-peopled.py (Note: You can have multiple scripts, they'll be run in lexicographic order.)
  3. People that script with your imports! Mine looked like the above import os etc. stuff. (Note: Cell %magics don't work here, sorry.)


An aside on alert()

Hey, let's talk about alert(), by the way.

I often run stuff in my Jupyter notebooks that takes a while. For example: if I'm pulling in lots (and lots) of data from somewhere, and it's just taking forever to load in memory. Or if I'm waiting for a PyMC3 MCMC sampling, or a neural net's many epochs.

In those cases, I want to be alerted when the process completes. Specifically, I want it to pull me back from whatever other thing I've wandered off to do (e.g. watching this). BEHOLD - that's what alert() is good for:

from IPython.display import Javascript

def alert():  
    js = """alert("All done!");"""

def some_long_process():
    return 1


So alert() just launches a JavaScript alert window whenever some_long_process() completes. You may want to manually allow popups in your localhost via your browser, since (I guess) most modern web browsers block them by default (after much 1990s abuse).

I mean, you're in the frickin' browser anyway, may as well leverage that JavaScript!