Date Tags learning (4 min read)

I just kicked some serious computer butt today. I would like to share.

Original problem

I just wanted a way to recursively .gitignore some .csv files in my work folders. CSV files, as you know, can get big (BIG). And GitHub doesn't like that; in fact, it was bouncing my git push -u origin master back - and complaining that I had exceeded the acceptable size of my commit (wow). My previous .gitignore always only excluded CSV files that were exactly two levels deep:


But this was no longer true. I had just massively reorganized my folders, and my CSVs were now scattered arbitrary-levels deep in my various project folder trees.


Well, first, this works in your .gitignore:


Specifically, that double wildcard, **, says "wildcard recursively". That is, "search through arbitrary levels of sub-directories". What joy! Which led me into a rabbit hole: wait, can I wildcard recursively in my shell?

Yes, but it's time to move to zsh

I just finished a very nice book about bash, and so I didn't want to haul over my well-earned shortcuts and aliases and all that onto a totally new shell interpreter. Ugh! (Though, to be fair, wiki notes that zsh is a superset of bash.)

But I saw some StackOverflow post saying that zsh has the double-glob (**) built-in, and that - if I wanted it in bash (my current/the default shell interpreter), I'd have to download something or other. So instead of downloading globstar, I decided to pull out the GUTS OF THE SHELL ENTIRELY and just install zsh.


Installing zsh

zsh is just another shell interpreter, with some nice add-ons and wing-dings that make it a superset of bash. I found oh-my-zsh, which is a "community-driven" zsh configuration helper - basically, people have already futzed with zsh profiles to make things pretty and functional and allow autocomplete, and so on. In general, I think it's wise to follow crowds on tech stuff - where goeth the crowd, there be-eth the StackOverflow answers.

Anyway, here's me installing zsh and getting going:

# OSX has zsh installed, but I saw somewhere that it's old? so let's update
brew install zsh

# change default shell from bash to zsh
chsh -s $(which zsh)

Then close and reopen iTerm2! Next was futzing around with the oh-my-zsh stuff, by enabling oh-my-zsh, as well as its plugins and themes in my .zshrc. I also copied over all my various aliases and such from .bashrc and .bash_profile to .zshrc. Everything basically worked fine. (Yes, I'll put my dotfiles on GitHub someday.) For prettiness, I used the pure prompt theme - which is not part of oh-my-zsh, and so needed some overriding. Here's what it looks like, when I've got my pyenv environment activated (www_env), I'm on a git master branch, and I'm in the blog folder:


Doing the thing I came here to do

After spending an hour or so getting zsh all ported over, all I wanted to do was remove the giant CSV files from my git history. They were several commits behind in my local branch's history, and I was stuck: I needed to roll back to the earliest commit (pre-CSV dump) and remove them (temporarily!) from the directory entirely. I needed to rewrite history. Since I had added the path/to/top/folder/**/*.csv line to my .gitignore, whatever changes I made were not being picked up by git.

So my plan was:

  1. Banish the CSV files to a temporary data/ folder.
  2. Roll back to an earlier commit (pre-CSV dump).
  3. Re-commit everything, now without any CSVs in the git history.
  4. Push up to origin/master! 😍

In zsh, all that was:

# 1. banish any CSVs in any subdirectory to new top-level data/ folder
find ./ -name '*.csv' | xargs -I '{}' mv {} data/

# 2. roll back to earlier commit, pre-CSV troubles
git reset [SOME_SHA]

# 3. re-commit everything
git status
git add .
git commit -m "OMG CSVS"

# 4. Pusshhhhhh!
git push -u origin master


At this point, my good ego feelings quickly disappeared. Since I had moved all CSVs out of any directory and plopped them down into data/, I no longer knew which CSV was associated with which directory (and thus, which project). I was horrified. And my CSVs had cryptic names like A.csv and eligible.csv. Huh!?

Thankfully, I knew that all the CSVs had been used (either created or imported) by some data analysis in Python. So I could just grep through all my .py files (recursively, throughout all the directories and sub-directories) to find when and where each CSV file appeared.

grep -rI '.csv' ./**/*.py

This showed me that, for example, Python file X in directory Y had been doing stuff with A.csv - voila! I could return A.csv to its rightful directory. And now, since all CSVs were .gitignored, I would never have to worry about this again!